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ABSTRACT 


This  report  discusses  a  research  effort  whose  object  is  the  design 
of  a  handprint  alpha-numeric  reader  capable  of  human  recognition  rates. 
The  effort  included: 

1.  Collection  of  a  large  data  base  of  unconstrained  handprinted 
a!ph.i-nu&&ric  characters. 

2.  Editing  the  data  to  re-label  mislabeled  characters,  remove 
noise,  and  delete  totally  illegible  characters.  A  separate  programming 
package  (the  Data  Base  Editing  Package)  was  designed  and  implemented 
for  this  purpose. 

3.  The  design  of  new  features  to  Improve  the  performance  of 
the  system. 

4.  The  design  of  the  recognition  logic  using  an  OLPARS-like  pro¬ 
gram,  the  Alpha-Numeric  Logic  Package  (ANLP),  which  used  the  expanded 
feature  set. 

5.  An  independent  test  of  the  logic  using  a  set  of  6127  charac¬ 
ters  not  included  in  the  design  set. 

6  Analysis  of  the  results  of  the  independent  test  to  develop 
reject  strategies  to  reduce  substitution  errors. 


«KSBSS»2a3S®©3Wsw*- 


TECHNICAL  EVALUATION 

This  effort  represents  a  significant  advance  in  the  area  of 
optical  character  recognition.  The  recognition  logic  developed 
was  designed  on  a  33  thousand  character  data  base,  one  of  the 
largest  if  not  the  largest  data  base  ever  compiled  for  this  pur¬ 
pose.  The  accuracy  of  the  logic  approaches  human  recognition 
capabilities.  In  fact,  when  the  ability  of  the  human  to  employ 
■'yntactical  information  to  distinguish  malformed  character  is 
negated,  the  performance  of  the  recognition  logic  becomes  equiva¬ 
lent  -?ith  that  of  the  human. 

The  character  rejection  rate  which  was  found  to  be  rather 
high,  can  be  traced  to  specific  problems  with  source  data  prepa¬ 
ration.  The  necessity  of  exercising  some  controls  on  data 
preparation  are  very  evident.  The  implementation  of  minimal 
printing  constraints  will  reduce  the  substitution  rate  to  12.. 

This  is  quite  significant  in  view  of  the  fact  that  the  error 
rate  of  unverified  commercial  keypunching  is  approximately  5%. 

For  all  practical  purposes  the  technique  has  been  proven. 

The  follow-on  to  this  work  should  be  the  implementation  and 
integration  of  hardware  and  software  into  a  final  system.  It 
should  be  noted  that  there  are  many  variables  (i.e.  degree  of 
constraints,  size  of  the  data  set,  substitution  and  reject  require¬ 
ments,  etc.)  which  influence  the  complexity  and  performance  of  a 
handprinted  character  reading  system.  For  this  reason,  there 
exist  the  possibility  for  a  family  of  readers,  each  tailored  to  a 
particular  set  of  these  variables.  It  follows  that  the  best 
approach  to  a  follow-on  is  not  one  directed  at  a  gersral  purpose 
reader  but  at  readers  customized  for  specific  applications. 
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SECTION  1 


INTRODUCTION  AND  SUMMARY 


A  researcher  unfamiliar  with  the  variation  in  unconstrained  hand¬ 
printing  might  easily  underestimate  the  amount  of  sophistication  needed  in 
the  design  of  an  alpha-numeric  reader.  A  great  degree  of  complexity  is 
necessary  to  approach  human  recognition  rates  in  any  automated  system. 

The  reason  is  that  natural  variations  in  shapes  and/or  breaks  in  a  character 
cause  a  "continuum"  of  character  shapes,  starting  with  a  character  in  one 
class  and  ending  with  a  character  in  another.  Figure  1-1  gives  a  few 
examples  of  this. 

The  variations  shown  in  Figure  1-1  occur  frequently  enough  in 
unconstrained  handprinting  to  insure  that  substitution  errors  will  occur. 

Our  goal  was  to  design  and  implement  a  system  which  would  minimize  sub¬ 
stitution  errors,  i.  e. ,  to.  reduce  the  substitution  rate  down  to  the  substitu¬ 
tion  one  might  expect  from  a  human  attempting  to  classify  the  characters. 

To  achieve  this  goal  the  following  °teps  were  taken: 

1.  Collection  of  a  large  cfet-a  base  of  unconstrained  handprinted 
alpha-numeric  characters. 

2.  Editing  the  data  to  re-label  mislabeled  characters,  remove 
noise  and  delete  totally  illegible  characters.  A  separate 
programming  package  (The  Database  Editing  Package)  was 
designed  and  implemented  for  this  purpose. 

3.  The  design  of  new  features  to  improve  the  performance  of 
the  system. 

i 

4.  The  design  of  the  recognition  logic  using  an  OLPARS-like 
program,  The  Alpha-Numeric  Logic  Package  (ANLP),  which 
used  the  expanded  feature  set. 

5.  An  independent  test  of  the  logic  using  a  set  of  6127  charac¬ 
ters  not  included  in  the  design  set. 

6.  Analysis  of  the  results  of  the  independent  test  to  develop 
reject  strategies  to  reduce  substitution  errors. 
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In  summary,  the  logic,  designed  on  an  edited  data  base  of  33,  128 
characters,  was  tested  using  four  different  reject  strategies,  D,  C,  B, 
and  A,  The  test  set  consisted  of  6127  unedited  characters. 

Using  reject  strategy  D  (which  rejects  a  character  only  in  the  case 
of  ties),  a  substitution  rate  of  18.  67%  and  a  reject  rate  of  2.  01%  was  observed 
on  the  test  set  data.  Using  reject  strategy  C  (which  rejects  a  character  if 
the  maximum  class  receives  32  or  less  votes  or  if  there  is  a  tie),  a  sub¬ 
stitution  rate  of  12.  87%  and  a  reject  rate  of  6.  92%  was  observed  on  the  test 
set.  Using  reject  strategy  B  (which  rejects  a  character  if  the  maximum 
class  receives  33  or  less  votes  or  if  there  is  a  tie),  a  substitution  rate  of 
11.45%  and  a  reject  rate  of  10.  18%  was  observed  on  the  test  data.  Using 
reject  strategy  A  {which  rejects  a  character  if  the  maximum  class  receives 
34  or  less  votes  or  if  there  is  a  tie),  a  substitution  rate  of  9.  14%  and  a 
ireject  rate  of  16.  97%  was  observed  on  the  test  data.  As  pointed  out  before, 
confusion  pairs  accounted  for  all  but  1%  of  the  substitutions  using  reject 
strategy  A. 

A  substitution  rate  of  1%  with  a  reject  rate  of  16.  97%  on  uncon¬ 
strained  alpha-numerics  compares  favorably  with  human  performance  and 
represents  a  significant  advance  in  the  field  of  OCR.  To  our  knowledge  there 
is  no  other  alpha-numeric  reader  in  existence  which  achieves  these  results 
on  unconstrained  data.  We  feel  that  the  system  with  reject  strategy  A 
operates  well  enough  to  be  of  practical  value  as  long  as  either;  (1)  certain 
unresolvable  pairs  are  not  permitted  in  the  same  fields,  or  (2)  some  con¬ 
straints  are  placed  on  members  of  these  pairs. 
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SECTION  2 


THE  STANDARD  FEATURES 


2.  1.  OVERVIEW 

The  alpha-numeric  feature  set  consists  of  the  up  to  84  standard 
features  developed  under  a  previous  contract  plus  the  8  special  features 
described  in  Section  3. 

The  standard  feature  set  consists  of  five  measurements  M(C)  = 
a  ,  . . . ,  a  made  on  each  of  the  convexities  C  of  the  left,  right,  top  and 
bottom  contoui'8.  Since  each  contour  is  forced  to  have  1,  3,  or  5  convexities, 
this  results  in  a  maximum  of  21* standard  measurements  for  each  contour 
and  thus  a  maximum  of  84  standard  measurements  from  each  character. 


2. 2.  PRE-PROCESSING 


The  first  operation  carried  out  was  that  of  converting  each  charac¬ 
ter  from  its  octal  format  to  a  24  x  24  raster.  (We  define  the  character  matrix 


A  =  <^a,j  ,  j,  i  a  1,  ...,  24  associated  with  a  character  by  a.^  =  1 

if  a  character  mark  is  present  in  position  i,  j  of  the  raster  and  a_  =  0  if 

not;  where  it  is  assumed  that  the  rows  of  the  raster  are  numbered  from  top 
to  bottom  and  the  columns  from  left  to  right  (unless  otherwise  stated)). 


Next,  a  height  normalization  was  performed.  Each  character  was 
"stretched1'  in  the  vertical  direction  so  that  it  extended  from  the  bottom  row 
(row  24)  to  the  top  row  (row  1)  of  the  raster  (Figure  2-1).  The  height  nor¬ 
malization  operates  in  three  steps.  First,  the  character  is  moved  to  the 
bottom  of  the  original  raster.  Next,  the  height  of  the  character  is  computed 
and  designated  symbolically  as  H  .  Finally,  the  character  is  stretched  by 
the  expansion  factor  M  =  24/H  as  shown  in  Figure  2-2, 


After  height  normalization,  the  left  and  right  histogram  vectors, 


L  =  Lj,  L 2#  ..  ,  i-<24  anc*  5.  =  Rjt  Rg,  •••»  ^24  were  calculated,  where 


*  -  Due  to  certain  redundancies,  4  of  the  25  measurements  taken  from  five 
convexities  are  eliminated. 


2  -  1 


«♦* 

it 

00 

#*#* 

* 

4 

* 

uooo 

* 

00t)0 

* 

UOU 

* 

000 

0  U  0  U 

* 

nooo 

OUOU 

* 

0000 

0  u  0 

* 

n000 

0000 

* 

0000 

0000 

* 

nnjo  ooouooo 

4 

nnooonooouou 

4 

nnooonooou 

1 

nno  00000 

4 

noo 

4 

0000 

4 

o 

ono 

4 

000 

4 

on 

4 

on 

4 

4r 

*•1 

»•**•«****« 

» 

no 

♦ 

* 

0000 

♦ 

* 

0000 

♦ 

* 

0000 

* 

+ 

noo 

♦ 

♦ 

000  0000 

* 

0000  nooo 

* 

0000  0000 

* 

0000  000 

* 

0000  0000 

* 

0000  onno 

* 

0000  oooocog 

a 

UGOO  0000C00 

* 

Goooooooncoo 

tx 

0000000000 

* 

000  ooono 

♦ 

000 

* 

000 

* 

0000 

* 

000 

* 

000 

* 

no 

4 

r  *1 

* 

CO 

♦  *** 

Before  Normalization 


After  Normalization 


Figure  2- 1 


A  "‘IT  ‘MUM  tint  ww«».ww»ww 


7 


L-(R-)  represents  the  distance  from  the  left  (right)  margin  to  the  character 

1  1  fri» 

along  the  i1”  ro»’.  The  top  and  bottom  histogram  vectors  T  and  3  are 
calculated  in  a  similar  fashion  after  rotating  the  raster  90°  counterclock¬ 
wise.  Figures  2-3,  a-c  illustrate  the  four  histogram  vectors. 

The  histogram  vectors  of  a  character  define  the  contours  which  are 
used  for  character  recognition-  li  seemed  that  the  left  and  right  histograms 
contained  enough  information  for  human  recognition  of  the  numeric  charac¬ 
ters  in  almost  every  case-  It  was  felt  that  the  left  and  right,  as  well  as  me 
upper  and  lower  histograms  would  suffice  for  separation  of  the  alpha- 
numerics.  except  for  a  few  confusion  cases. 

A  straight  forward  pattern  analysis  (using  the  Non- Linear  Mapping 
Algorithm  of  QLPARS)  of  the  left  and  right  histogram  vectors  was  performed 
and  some  partial  separation  was  achieved.  For  the  most  part,  however,  the 
results  indicated  that  cluster  analysis  of  the  histogram  vector  (with  Euclidean 
metric)  was  not  the  correct  method  of  utilizing  the  "shape  *  information 
inherent  in  the  vector.  Analysis  of  the  situation  led  to  the  following  explana¬ 
tion.  The  (left)  histograms  in  Figures  2-4.  a  &  b  are  easily  recognized  as 
belonging  to  the  character  2.  Intuitively,  we  feel  that  they  have  similar 
shapes  in  that  they  both  have  two  '"bumps"  separated  by  a  vertical  section. 
Their  vectors,  however,  are  no  closer  tc  each  other  than  they  are  to  the 
left  histogram  in  Figure  2-4.  c. ,  which  belongs  to  a  1.  This  is  because  the 
’’shape*'  quality  of  the  histogram  is  of  a  statistical  nature  so  that  comparing 
the  two  vectors  coordinate- by- coordinate  (as  we  do  in  measuring  Euclidean 
i  distance)  does  not  measure  closeness  of  ’  shape”.  For  example,  the  vectors 

of  Figure  2-4.  a.  and  2-4.  b.  are  similar  in  shape  because  of  the  ” statist! cal"- 
type  of  statement  -  ’’the  coordinates  aecrease  gradually,  then  a  sharp  in¬ 
crease  occurs,  then  the  coordinates  are  constant,  followed  by  a  slight  de¬ 
crease,  followed  by  a  sharp  increase.  "  This  type  of  word  description 
suggested  the.  approximation  of  the  histogram  with  line  segments  having 
quantized  slopes  and  was  the  motivation  behind  the  string  representation  of 
a  wave. 


Directed  line  segments  having  any  of  the  five  slopes  were  used,  the 
five  directions  being  270°,  denoted  by  V  ;  0°  denoted  by  +H;  180°  denoted 
by  -H;  225°  denoted  by  -S;  and  315°  denoted  by  +S  (Figure  2-5). 

The  length  of  the  line  segments  approximating  a  given  histogram 
was  measured  by  the  number  of  rows  (for  vertical  and  slanted  segments)  or 
number  of  columns  (for  horizontal  segments)  of  the  raster  that  the  line 


~  -  By  statistical  we  mean  a  statement  about  a  subset  of  the  coordinates 
rather  than  individual  coordinates. 
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segment  cut  across.  The  ordering  of  the  string  associated  with  a  given 
histogram  was  determined  by  the  ordering  of  the  corresponding  line  seg¬ 
ments  starting  at  the  top  of  the  raster  for  left  and  nght  histograms, 
starting  at  the  right  for  top  and  bottom  histograms. 


Just  prior  to  generating  the  string  representation  some,  editing  is 
accomplished  to  fill  in  certain  breaks  in  the  character  and  also  to  smooth 
the  histogram  representation.  No  attempt  has  been  made  to  eliminate  salt 
and  pepper  noise  since  it  was  felt  that  this  type  of  noise  is  highly  dependent 
upon  the  actual  scanner  used  and  the  quality  of  the  paper  being  reac.  Salt 
and  pepper  noise  elimination  algorithms  should  be  included  during  the  proto¬ 
type  development  stage.  The  editing  algorithm  used  here  was  chiefly  con¬ 
cerned  with  breaks  along  single  stroke  segments  of  the  character.  For 
example,  if  Li0  and  L.j  are  the  first  and  last  (i.  e. ,  ^  4  25  ; 

4  25)  histogram  coordinates  of  value  25  in  a  consecutive  string  of 


coordinates  equal  to  25,  then  the  histogram  coordinates  L.^,  . . . ,  L.  ?  are 

L  T  Tj 

changed  to  the  average  value  iO- 1  T  l-fl  .  The  effect  of  this  editing  is 


shown  in  Figure  2-6. 


Upon  completing  the  editing  function,  a  difference  string  is  gener¬ 
ated  from  the  edited  histogram  representation.  The  difference  string  is 
denoted  A.  where  A.  =  V  -  V.  ,  i  =  1,  2,  . . . ,  23  and  V  =  V  ,  V  ,  . . . , 

1  1  It  I  1  J.  b 


represents  an  arbitrary  histogram  vector  after  editing.  The  A.  string  is 
next  used  to  "fit"  the  character  contour  with  the  straight  line  segments  shown 
previously  in  Figure  2-5.  This  procedure  is  conducted  in  a  straight-forward 
manner  by  first  marking  the  position  along  the  A^  string  where  sign  changes 
occur.  Between  the  marks  are  segments  where  the  histogram  elements  are 
either  increasing  or  decreasing  monitonically,  depending  upon  the  sign  of  the 
A^  elements  of  the  corresponding  segment.  The  slope  of  each  segment  is 
computed  and  used  to  determine  which  straight  line  approximations  of  Fig¬ 
ure  2-5  are  appropriate.  The  criterion  for  fitting  is  given  below. 


Slop  a 

String  Symbol 

sl|  < 

1 

2 

V(^(s)) 

1  |SL  | 

<  4 

1S(^  (s)) 

4  <  |sL 

+H(  |sl|  ) 

2 


10 


15 


The  following  definitions  apply  to  the  above  table: 


SL 

= 

slope  of  the  segment 

SL  (s) 

= 

the  length  of  the  segment 

SI  (s) 

V(JUs)) 

= 

a  vertical  segment  of  length 

S(X(s» 

= 

a  slant  segment  of  length  J(_ 

(S) 

h(|sl|) 

= 

a  horizontal  segment  of  length  |  SL 

The  sign  is  determined  in  accord  with  the  sign  of  the  corresponding  A. 
string.  If  a  +H(N)  is  adjacent  to  a  -H(M),  the  symbol  V(2)  is  inserted 
between.  Upon  completion  of  these  steps  a  typical  string  for  the  character 
three  (3)  might  be:  (left  string) 

-S(4),  +H(6),  V(2),  -H(5),  V(2),  +H(8),  V<3),  -S(2),  -H(7),  V(4) 

This  string  representation  will  be  used  to  determine  the  convexities,  how¬ 
ever,  before  proceeding  to  this  step  the  first  and  last  elements  of  the  string 
are  modified.  The  width  of  the  character  is  measured  at  the  top  (designated 
T)  and  bottom  (designated  B).  Next,  a  -H(T)  is  appended  at  the  beginning 
cf  the  left  string  (a  +H(T)  at  the  beginning  of  the  right  string)  and  a  +H(B) 
is  appended  at  the  end  of  the  left  string  (a  -H(B)  at  the  end  of  the  right  string. 
If  adjacent  horizontals  occur,  they  are  combined  as  follows: 

H(y?j),  H(J?2)  =H(^j  +  J?2). 

By  forcing  each  string  to  start  and  end  with  -H  (+H)  and  end  with  a  -(H  (-H) 
we  insure  that  the  string  has  an  odd  number  of  convexities. 

The  number  of  convexities  for  each  string  of  each  character  is  now 
calculated  where  a  positive  convexity  is  defined  to  be  an  increasing  sub¬ 
string  of  maximal  length,  a  negative  convexity  is  defined  to  be  a  decreasing 
substring  of  maximal  length  where  the  symbols  are  ordered 

+H( . .  )  4S(. .  )  V{. .  )  -S(..  )  -H(  . .  ). 

Example:  negative  convexity  negative  convexity 


( - ^ 

+H(3),  +S(1),  -H(5),  V(6),  +S(4),  -S(4),  V(5),  +H(6) 


positive  convexity  positive  positive 

convexity  convexity 

It  is  clear  that  every  string  (of  length  2  or  more)  breaks  down  into  an  alter¬ 
nating  sequence  of  positive  and  negative  convexities. 
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For  any  character  with  more  than  5  convexities  in  any  of  its  strings 
the  following  will  be  iterated  until  there  are  only  5  convexities  in  the  string. 
Find  that  segment  in  the  string  which  has  a  minimal  length  and  remove  it. 

This  is  subject  to  the  condition  that  we  do  not  remove  an  initial  H  or  a 
terminal  H  ,  nor  do  we  remove  any  symbol  which  occurs  between  consecu¬ 
tive  H's  of  opposite  sign.  After  we  remove  a  symbol  we  re-combine  the 
symbols  adjacent  to  it  if  they  are  identical,  e.  g.  ,  if  +S(7),  V(2),  +S(6)  is 
a  subsequence  of  a  string,  and  we  remove  V(2),  we  would  then  combine  the 
two  +S's  to  +S(13).  After  each  removal  we  recalculate  the  number  of  con¬ 
vexities  until  that  number  drops  to  5. 

In  order  to  apply  standard  pattern  recognition  techniques  to  the 
classification  of  the  strings  of  the  previous  section,  it  is  necessary  to  define 
a  function  M(s),  which  maps  each  string  s  into  a  vector  space  in  such  a 
way  that  strings  with  similar  shapes  are  mapped  into  vectors  which  are  close 
to  each  other  and  strings  with  dissimilar  shapes  are  not. 

The  map  M(s)  will  be  defined  on  the  positive  convexities  and  then 
extended  to  arbitrary  strings.  For  each  of  notation  we  replace  the  symbols 
+H,  +S,  V,  -S,  -H  by  A  ,  A 2,  A3,  A^,  Ag,  *  respectively.  Next  we  add 

symbols  of  the  form  A^(0)  to  every  positive  convexity  so  that  every  positive 

convexity  has  the  form  Aj(k  ),  A^V 2 ),  A^kj),  A4(k4),  A5(kg).  For  example, 

A  (2),  A3(4),  A4(l)  becomes  A1(2),  A2(0),  A3(4),  A4(l),  Ag(0)  *  Aj(3)t 

Ag(2)  becomes  Aj(3),  A  (0),  A3(°)»  A4(°)>  Ag(2)>  etc* 

We  define  the  vector  representation  D(s)  of  a  positive  convexity 
s  =  Ajlkj),  A2(k2),  A3(k3)»  A4<k4)»  A5(k5>  to  be  the  vector  kj,  k2,  k3,  k4, 

kg.  Then  M(s)  is  defined  by  M(s)  =  kj,  kj  +  k2>  k^  +  k3  +  k4,  k4  +  kg,  k^ 

where  D(s)  =  k  ,  k2,  k3,  k4,  kg.  The  first  and  fifty  measurements,  kj,  kg 

are  simply  the  lengths  of  the  horizontal  segments.  The  second,  third  and 

fourth  measurements  are  the  lengths  of  the  top  horizontal  leg,  a  ,  vertical 

drop,  b  ,  and  lower  horizontal  leg,  c  ,  respectively  (figure  2-7).  We  define 

add  symbols  of  the  form  A  (0)  to  it  so  that  it  has  the  form  A  (k  ),  A  (k  ), 

i  5  1^2 

A3(k3),  A  (k4),  Aj(kg).  For  example,  the  negative  convexity 


*  -  These  two  sets  of  symbols  will  be  used  interchangeably. 
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A4(l),  A2(l),  A^?)  becomes  AgfO),  A^l),  A3(0},  A^l),  A^).  We  define 

the  inversion  mapping  I(s)  on  any  negative  convexity  s  =  A_(k. ),  A,(k  ), 

5  1  4  2 

A3{k3),  A2(k4),  A^kg)  by  I(s)  =  A^(k^),  A^y,  A^),  A^k^,  A^kg). 

Clearly,  this  maps  the  negative  convexities  into  the  positive  convexities. 
Figure  2-8  illustrates  the  effect  of  this  mapping. 

The  mapping  M  is  defined  on  a  negative  convexity  s  by; 


M(s) 


{-a,  -b,  c,  -d,  -e)  where 


M(I(s))  =  (a,  b,  c,  d,  e). 

In  order  to  define  M  on  arbitrary  strings*,  we  break  the  string  down  into 
an  alternating  sequence  of  positive  and  negative  convexities.  The  following 
theorem  shows  that  this  is  always  possible. 

Theorem: 

Any  string  s  of  length  >  1  can  be  written  in  a  unique  way  as  an 
alternating  sequence  a^,  a2>  ,  . . ,  a^  of  positive  and  negative  convexities 

where  the  last  element  of  a.  is  the  first  element  of  a.  ,  i  =  1,  . . . ,  k-1, 

i  l+l 

i.  e. ,  if  s  =  s^,  . . . ,  s^  ,  n  >  1,  then  there  exists  unique  integers 
1  <  ij  <  i2  <  ...  <  i^  <  n  such  that  a^,  a^, . . . ,  is  an  alternating 
sequence  of  positive  and  negative  convexities  where  a^  =  s^,  . . . ,  s.  ; 

cl.  “  S  •  *  •  f  •  m  S.  f  t  •  C  J  S’,  —  S.  a  *  I  •  •  S  • 

l  ij  >2  k  'k 

Proof; 

The  proof  is  by  induction  on  the  length  n  of  the  string.  For  any 
string  s  =  Sj,  8.J  of  length  n  =  2  we  have  either  s^<s^  or  s^  >  s2* 

Thus,  any  string  of  length  2  is  a  convexity.  Assume  the  theorem  is  true 


*  -  Strictly  speaking  the  strings  we  deal  with  are  not  arbitrary  since  A.  (k) 
followed  by  A,{k' )  implies  i  i  j. 


IS 


-H(l),  V(2),  +S(1)  - *■  +H(1),  V(2),  -S(l) 

A5(l),  A3(2),  A2(l) - •‘Ajd),  A3(2),  A4(l) 


Figure  2-8 
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for  n  =  nQ.  Let  Sj,  . . . ,  ^  be  any  string  of  length  n^  -f  1.  Now  by 

the  induction  hypothesis  there  are  unique  numbers  1  <  i.  4  i  ^  <  i  n 

such  that  a^,  a^,  a^  is  an  alternating  sequence  of  positive  and  nega¬ 
tive  convexities  where  a^  =  s^,  ...»  .3.  ;  a  =  s-  ,  s.  ;  ...  ;  a^  = 

112 

s-  ,  . . . ,  s  .  Let  us  assume  a,  is  positive.  If  *n  +i  >  «n  then, 
k  0  K  0  0 

a’k  =  ,  s^  +1  is  still  a  positive  convexity,  and  a^,  a^,  ...»  afc  a'k 

is  the  required  alternating  sequence.  Furthermore,  this  is  the  only  alter¬ 
nating  sequence  possible  since  the  numbers  i^,  . . . ,  i^  are  unique  (by  the 

induction  hypothesis).  If  8n  +j  8n  »  t^len  ajc+j  =  8n  »  sn  +j  *8  a  nega~ 

tive  convexity  so  a^,  . . . ,  a^  ,  a^+j  is  the  unique  required  alternating 

sequence  of  positive  and  negative  convexities.  A  similar  argument  holds  if 
is  assumed  to  be  a  negative  convexity. 

Using  this  theorem  we  extend  the  map  M  to  arbitrary  strings  as 

follows:  If  s  =  s„  . . . ,  8  is  an  arbitrary  string,  then  we  define 
■t  n 

M(s)  =  (M(aj),  M(a2),  ...,  Mv'a^))  where  a^,  ...,  a^  is  file 


alternating  sequence  of  convexities  associated  with  s  as  described  in  the 
above  theorem.  Since  each  string  has  X,  3,  or  5  convexities  we  thus  have 
5,  15,  or  25  measurements  for  each  string  respectively. 
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SECTION  3 


the  special,  features 


3. 1.  THE  iSDDP  MEASUREMENT 


The  first  of  file  eight  special  measurements  is  designated  MEQUP. 
As  the  name  implies,  this  feature  measures  a  characteristic  related  to  the 
upward  view  of  file  character  from  a  row  somewhere  around  the  middle  of 
the  character.  The  row  used  is.  row  16.  The  upward  view  of  the  charac¬ 
ter  from  row  16  is  obtained  by  computing  a  ,^midline-up,,  histogram  desig¬ 
nated  MHiST.  The  element  of  MHIST,  designated  MHIST(i)  is  simply^ 
the  row  number  of  the  first  non-zero  bit  encountered  when  scanning  the  I~ 
column  upward  from  (and  including)  the  16th  row.  In  the  case  where  no 
non- zero  bit  is  found,  the  value  of  MHIST  for  that  column  is  set  equal  to 
zero.  The  midline-up  histogram  for  the  character  ’“two"  of  Figure  3-1 
is  listed  in  Table  3. 1. 


TABLE  3. 1. 

Midline- Up  Histogram  Topdown  Histogram 

1  MHIST(I)  THIST(I) 
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4 

5 

6 
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8 
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23 

24 


0 

0 

0 

0 

0 

0 
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16 
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14 

12 

9 

0 

0 

0 

12 

0 

0 

0 

0 
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24 

24 
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4 
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1 

1 

1 

3 

3 

6 

21 
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21 

12 

24 

24 

24 
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Tbs  histogram  is  used  to  determine  fee  begimfing 

column  ^  #-rtrr^r^g  rutin-nxm  ctf  fe*>  ^TODTT  DOrtii^  CS  fe*1  ff»?raft»r,  fee 
columns  being  designated  3EG1M  and  END  respectively.  Next,  fee  san- 
snn  histogram  value  in  columns  BEGIN  through  3EGTN43  inclusive  is  found 
MAXI.  The  maximum  histogram  value  is  columns  23*53-6 
through  END  inclusive  is  found  and  designated  MAX2.  Finally,  fee  mdni- 
asp  histogram  value  in  columns  BEGDv-r3  through  END-4  inclusive  is  found 
and  designated  MIX.  These  three  measurements  are  combined  as  follows 
to  uroduce  the  value  of  fee  MIDuP  feature. 


f  MAXI  v  MAX2  -  2-  MIX  END-BEGIN  >  7 


MEDUP  = 


•where 


3-1 


i  0 

L 

Otherwise 

MAXI  =  MAX  { MH£ST(I)  \  ,  I  = 

BEGIN,  BEGINfl,  ...,  33G1N43 

MAX2  =  MAX  [MHIST(I)»  ,  I  = 

END-6,  END-5,  ...,  END 

MIN  =  MIN  \MHZST(I)  \  ,  1  = 

BEGE\~3,  ...,  END-4 

i 

! 

s 

* 

Referring  to  Table  3.  1. ,  it  is  seen  that  for  the  raster  of  Figure 

BEGIN  = 

7  th  Column  (I) 

t 

END 

19th  Column  (I) 

J 

MAXI  = 

16 

t 

J 

MAX2  = 

14 

\ 

MIN 

9 

MIDUP  = 

16414-2-9-12 

3.2. 


MIDUP2 


The  second  special  feature  is  designated  MIDUP2.  Its  value  is 
determined  by  counting  the  number  of  rows  between  "middle'’  row  16  and 
the  row  containing  the  first  non- zero  bit  along  the  column,  where 

J  =  Lj^-1,  when  scanning  upward  from  (but  not  including)  row  16.  Stated 
differently,  the  column  to  be  checked  for  a  non-zero  bit  is  determined  by 
scanning  the  16th  row  from  the  left  until  the  first  non-zero  bit  is  found. 

By  Dacking  off  one  column,  tbe  column  which  will  be  scanned  next  is  deter¬ 
mined.  This  coi.-mn  is  simply  I,  ^  -  1.  Finally,  the  -  1  column  is 

scanned  upward  from  row  16  until  a  non- zero  bit  is  found.  The  row  number 
containing  this  bit  is  subtracted  from  16  to  produce  M1DUP2.  Turning  to 
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fee  example  sharers  is  Figere  3-!,  Si  is  sees  feat  I*.,.-!  =  7  and  feat  the  row 
csssiainisg  She  first  nsss-zero  bit  is  row  Z.  Thns  MiDtiP2  =16-8  =  8. 

Tbe  MID  HP  and  MH>uP2  features  are  useful  is  GiscrirsInaSing 
certain  sevens  from  either  Soars,  sines  or  A's.  Consider  for  example, 
sevens  such  as: 

f  aa«  ? 

Tbe  first  seven  will  resemble  a  closed-top  Soar  or  a  skev  A  and  fee  second 
•will  resemble  a  nine  when  viewing  these  characters  from  fee  left  and  rigM 
sides.  Tbe  dev  from  fee  bottom  of  fee  Srst  7  may  be  obscured  by  fee 
slanted  stem  of  fee  7.  However,  fee  MID  UP  and  MXDUP2  measurements 
allow  these  sevens  to  be  distinguished  since  fee  view  up  from  fee  "micdie" 
line  of  both  fours,  sines  and  A’s  will  be  blocked  by  a  relatively  low  horizontal 
stroke  which  is  not  present  in  fee  case  of  a  seven. 


3. 5.  MOTOP 

The  third  of  the  eight  special  measurements  is  designated  MOTOP. 
Effectively,  this  feature  measures  the  degree  of  openness  at  the  top  of  a 
character  and  hence  tbs  name  "open  top  measurement"  symbolically 
referenced  MOTOP.  This  feature  is  derived  from  viewing  the  character 
from  fee  top  row  and  is  computed  from  fee  values  of  a  ,ttopdowa"  histogram 
designated  THIST.  The  value  of  fee  element  of  THIST  is  THiST(I) 
and  is  simply  the  row  number  of  the  first  non- zero  bit  in  fee  column. 

The  tepdown  histogram  for  the  character  ’'two"  of  Figure  3-1  is  listed  in 
Table  3. 1.  The  THIST  histogram  is  first  used  to  determine  the  beginning 
column  and  the  ending  column  of  fee  character  to  be  used  for  the  MOTOP 
compilation,  the  columns  being  designated  BEGIN  and  END  respectively. 
Next,  fee  maximum  histogram  value  in  columns  BEGIN +2  through  END- 2 
inclusi*.  e  is  found  and  designated  TMAX.  The  minimum  histogram  value  in 
column  .  BEGIN  through  BEGIN +3  inclusive  is  determined  next  and  desig¬ 
nated  TMINI.  Finally,  the  minimum  histogram  value  in  columns  END-3 
through  END  inclusive  is  found  and  designated  TMIN2.  These  measurements 
are  combined  to  produce  fee  value  of  the  MOTOP  feature  as  shown  below: 

f  2-  TMAX  -  (TMINI  +  TMIN2)  END- BEGIN  >  8 
MOTOP  =  < 

0  Otherwise 


3  -  4 


TMAX  =  MAX{TH*ST(I)K 
TMENl  =  MIN  \THSTO)|  , 
IM0{2  =  M5N  i  THE>T(I)^  , 
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I  =  BEGIN42,  3EGIN43,  ENB- 

i  =  BEGIN,  BEGIN- 1,  ...,  BEGINS 
I  =  END- 3,  END-2,  END 


Referring  to  Table  3.  1. ,  ifc  is  seen  that  for  the  raster  of  Figure  3-i. 


3EGIN  = 

7  th  Column  (I) 

END 

19th  Column  (I) 

TMAX  = 

21 

TM2N1  = 

1 

TMEN2  = 

12 

and  therefore  MOTOP  =  2*21  -  (1412)  =  29. 


3.  4.  AVERAGE  WIDTH  MEASUREMENTS 


Three  additional  special  features  are  measured  which  pertain  to 
the  average  width  of  the  character.  The  first  of  these  measures  is  the 
average  width  across  a  segment  located  near  the  bottom  of  the  character 
and  is  designated  BOTAVE.  The  second  measure  is  the  average  width 
across  a  segment  located  near  the  middle  of  the  character  and  is  desig¬ 
nated  MIDAVE.  The  last  measure  is  the  average  width  over  a  large  central 
region  of  the  character  and  is  designated  OVRAVE.  The  width  of  the  I**1 
row  is  given  by  RHIST(I)  -  LHIST(I)  4  1,  where  RHIST  and  LHLST  refer  to 
the  break- corrected  histograms.  Using  this  notation,  the  three  average 
width  features  are  given  by: 


BOTAVE 


1 

I 

(RHIST(I)  -  LHIST(I)  4  1)  j 


1 

MIDAVE  =  7 
6 


(RHIST(I)  -  LHIST(I)  4  1) 


OVRAVE  =  — 
16 


1=5 


(RHIST(I)  -  LHIST(I)  4  ) ' 


3  -  5 


t\) 


ze 

Using  the  left  and  right  histogram  -values  listed  in  Table  3-2 
corresponding  to  the  ,5£wo"  of  Figure  3-2,  the  following  values  are 
competed: 


BOTAVE 

=  43/6  =  7 

L  J 

f—  “| 

MEDAVE 

=  [27/6 |  =  4 

OVRAVE 

=  [96/1^  =  5 

In  each  case,  the  lower  integer  value  is  used  as  the  feature  value. 


3.  5.  TOPLIN  and  BOTLIN 

The  remaining  two  of  the  eight  sp.ecial  features  are  related  to  the 
number  of  line  segments  which  are  crossed  when  scanning  across  a  speci¬ 
fied  group  of  rows.  For  the  purpose  of  this  computation,  a  line  segment  is 
defined  by  the  presence  of  one  or  more  consecutive  one  bits  which  are  bor¬ 
dered  on  the  left  and  right  by  zeros  when  scanning  a  row  of  the  character. 
For  example,  the  following  row  contains  two  line  segments: 

0000110001110000 

The  first  of  these  faatures,  designated  TOPLIN,  is  simply  a  count  of  the 
total  number  of  line  segments  determined  by  scanning  rows  5  through  9 
inclusive.  The  second,  designated  BOTLIN,  is  a  count  of  the  total  number 
of  line  segments  for  rows  16  through  20  inclusive.  Following  this  pro¬ 
cedure  on  the  "two*1  of  Figure  3-1,  it  is  determined  that: 

TOPLIN  =  8 

BOTLIN  =  7 

It  should  be  evident  that  the  TOPLIN  and  BOTLIN  features  are 
highly  related  to  the  discrimination  of  eights,  H's  and  X's  from  other 
characters.  These  are  sometimes  malformed  in  the  sense  that  the  shape 
information  derived  from  the  contours  is  unreliable.  In  these  instances, 
the  presence  of  two  line  segments  at  the  top  and  the  bottom,  resulting  in 
large  TOPLIN  and  BOTLIN  values,  are  very  useful. 
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TABLE  3-2 


Left  Histogram 
I_  LHIST(I) 


Right  Histogram 
RKIST(I) 


1 

2 

3 

4 

5 

6 
7 
B 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 


10 

:o 

9 

i 

1/ 

7 

7 

7 
13 
12 
12 
11 
10 
10 

25  (9  after  break 
correction) 

8 
8 
8 
7 
7 

7 

8 
8 
8 


12 

12 

14 

14 

14 

15 
15 
15 
15 
14 

14 
19 
13 
13 

25  (12  after  break 
correction) 

11 

11 

11 

15 
15 
19 
19 
19 
19 
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Figure  3-2 
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SECTION  4 


RECOGNITION  LOGIC 


4.  1.  THE  DECISION  LOGIC 

As  explained  in  Section  2,  measurements  are  taken  from  each  of 
the  four  contours,  left,  right,  top  and  bottom,  of  each  character.  These 
standard  measurements  in  addition  to  the  special  measurements  comprise 
the  feature  vector  upon  which  the  logic  operates.  *  For  each  of  the 

36  35 

- j —  =  630  pairs  of  classes  a  partial  decision  is  reached  by  operating 

on  a  subset  of  the  feature  vector  composed  of  the  standard  measurements 
taken  from  two  of  the  four  contours  plus  the  8  special  measurements.  The 
two  contours  used  depend  on  the  particular  character  class  pair  in  question. 
For  example,  for  the  pair  H/N,  the  top  and  bottom  contour  are  used  since 
these  contours  correspond  to  the  two  "looks"  which  give  the  best  discrimina¬ 
tion.  That  is,  looking  from  the  left  and  right  these  characters  appear  iden¬ 
tical,  but  looking  from  the  top  and  bottom  they  do  not.  The  two  contours 
used  for  each  character  pair  are  listed  in  Appendix  B  of  (3). 

The  outcome  of  each  pairwise  decision  can  be  a  vote  for  one  of  the 
two  classes  or  a  no  vote  decision.  Thus,  if  the  character  class  pair  were 
A/B  the  outcome  would  be  a  vote  for  A,  a  vote  for  B,  or  a  vote  for  neither. 
The  final  decision  is  reached  by  adding  all  the  votes  for  each  character 
class  and  choosing  the  class  which  has  the  maximum  number  of  votes.  If 
two  or  more  classes  receive  a  maximum  number  of  votes  the  character  is 
rejected.  ## 

Before  we  discuss  the  details  of  how  each  pairwise  decision  is 
made,  a  discussion  of  linear  discriminant  logic  is  in  order. 

Let  us  denote  the  feature  vector  X  of  character  by: 


*  -  A  detailed  description  of  the  program  which  implements  the  recogni¬ 
tion  logic  is  contained  in  Section  7. 

-  Reject  strategies  based  on  the  number  of  votes  the  maximum  class 
receives  are  discussed  in  the  following  section. 
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X, 


x„ 


(4.  1. ) 


A  linear  discriminant  is  computed  by  taking  the  inner  product  of  the  discri¬ 
minant  vector  d  with  the  character  feature  vector  X  where: 


L  '  (4.  ?.  ) 


* 

The  inner  product  generates  a  scalar  Z  (i.  e. ,  a  number)  which  is  used  to 
make  a  decision  between  two  classes.  If  the  value  of  Z  exceeds  a  thresh¬ 
old  0,  the  decision  is  made  for  one  of  the  classes;  otherwise  the  other  class 
is  decided.  Specifically,  the  inner  product  is  given  by: 

L 

F.  dixi  (4.3.) 

i  =  l 

As  seen  in  equation  (4.  3.  )  a  linear  discriminant  is  nothing  more  than  a 
weighted  linear  combination  of  the  features  x^.  Our  problem  simply 
amounts  to  computing  good  weights  (or  equivalently  a  vector  d  )  for  dis¬ 
criminating  the  pair  of  classes  in  question.  The  optimal  procedure  used 
for  both  the  numeric  and  alpha-numeric  logic  is  based  upon  the  Fisher 
Linear  Discriminant.  ^ 

Programs  PHASEONE  and  PHASONEA  of  the  ANLP  produce  weights 
such  that  the  difference  between  the  mean  value  of  Z  (equation  (4.  3.  )  for 
the  two  classes  is  maximized  relative  to  the  sum  of  scatter  of  Z  for  the 
two  classes.  *  The  mathematical  derivation  of  the  optimal  linear  discrimin¬ 
ant  is  given  in  (2). 


=  x>  = 


,T 
d  x 


*  -  The  scatter  is  a  statistic  closely  related  to  the  variance.  The  scatter  is 
equal  to  (N-l)  times  the  variance  where  N  is  the  number  of  samples  used 
to  estimate  the  variance. 
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The  following  geometric  interpretation  of  the  decision  procedure 
may  be  helpful.  Consider  the  case  where  L  =  2  ;  that  is,  only  two  features 
Xj  and  are  used  to  discriminate  the  classes  I  and  J  .  The  sample 

data  from  the  two  classes  are  represented  as  feature  vectors  in  the  x^, 

space  as  shown  in  Figure  4-1.  The  decision  rule  specified  by  a  linear  dis¬ 
criminant  requires  that  the  inner  product  between  an  unknown  vector  X 
and  the  discriminant  vector  <£.  be  computed.  It  is  easily  shown  that  an 
equivalent  form  of  equation  4.  3.  is: 


Z  =  |  d  |  |  X  j  cos  oc 

where  j  d  j  is  the  vector  length  of  the  discriminant  vector. 


(4.4.) 


(4.5.) 


Z 


The  discriminant  vector  is  normalized  such  that  d 


=  1  and  so 


cos  c<  ,  where  c<  is  the  angle  between  the  vectors  X  and 


d.  Therefore,  Z  simply  is  the  orthogonal  projection  of  X  onto  the  direc 
tion  of  d  as  shown  in  Figure  4-1.  Since  the  decision  rule  is: 


Decide  I  if  Z  >  0  (4. 6.  ) 

Decide  J  if  Z  <  0 


the  decision  boundary  is  given  by  the  locus  of  all  points  such  that  Z  =  0. 

For  this  case,  the  decision  boundary  is  simply  a  straight  line  perpendicular 
to  the  direction  of  d  at  a  distance  cf  0  from  the  origin  along  d  .  In 
general  the  decision  boundary  implemented  by  a  linear  discriminant  is  a 
linear  hyperplane  which  divides  the  feature  space  into  two  regions,  one 
associated  with  class  I  and  the  other  with  class  J. 


We  now  return  to  the  detail  of  how  each  pairwise  decision  is  made. 
The  type  of  logic  was  determi?  sd  by  the  number  of  available  samples  from 
the  two  classes.  The  following  rules  were  used: 
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Figure  4-1 

Linear  Discriminant  L  =  E  Geometric  Interpretation 
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Ni> 

2 

nj< 

2 

Decide  I 

e* 

IV 

2 

*!<■ 

2 

Decide  J 

nj< 

2 

N  ^ 
J 

2 

No  Vote 

N,> 

2 

v  ■> 

X\J- 

2 

Optimal  Linear  Discriminant 

where  N^  equals  the  number  of  samples  frcm  class  I.  Types  1-3  were 

used  in  approximately  65%  of  the  pairwise  cases.  The  reason  for  this  is 
due  to  the  fact  that  every  character  has  a  preferred  set  of  sort  groups  where 
it  will  normally  be  found,  where  the  sort  group  is  determined  by  the  num¬ 
ber  of  convexities  a  character  has  in  each  of  two  specified  contours.  For 
example,  the  numeral  seven  will  be  in  the  3, 1  sort  group  when  viewed  from 
the  left  and  right,  respectively;  i.  e. ,  it  will  have  3  convexities  on  the  left 
and  one  on  the  right  most  of  the  time. 

In  contrast,  an  E  never  has  just  one  convexity  on  the  right  and  thus 
is  never  in  the  3, 1  sort  class  when  viewed  from  the  left  and  right.  Since 
the  left  and  right  looks  are  the  ones  used  in  the  E/7  test,  and  since  charac¬ 
ters  are  grouped  by  sort  class  prior  to  entering  the  decision  logic,  the 
decision  logic  for  E/7  in  the  3,  1  sort  class  simply  is  "decide  7".  Similarly,  in 
the  1,  5  sort  class  (when  viewing  from  left  and  right),  since  no  sevens  can 
occur  there  and  E  commonly  occurs  there,  the  logic  is  simply  "decide  E". 


r% 


4.  2.  LOGIC  OPTIMIZATION 


The  following  is  a  description  of  the  optimal  strategy  for  sequencing 
through  the  logical  computations  pertinent  to  reject  strategy  A  (see  next 
section).  Let  K  denote  the  number  of  classes  (K  =  36.  for  the  alpha¬ 
numeric  problem)  and  assume  that  the  rejection  strategy  requires  that  some 
class  receive  all  of  the  possible  votes  (i.  e. ,  (K-l)),  otherwise  the  charac¬ 
ter  is  rejected.  Furthermore,  let  us  assume  that  all  of  the  logic  for  the 
K(K-l)/2  pairwise  tests  resides  in  core.  Now,  since  the  computer  is  a 
sequential  device  we  must  direct  it  to  compute  the  pairwise  tests  in  some 
prescribed  order.  We  could  compute  all  tests  in  an  arbitrary  order  and 
simply  tally  up  the  votes  for  each  class  at  the  completion.  Any  class 
receiving  (K-l)  votes  would  be  Che  decision  class,  otherwise  a  rejection  is 
signaled.  This  strategy  can  easily  be  improved  upon  since  it  is  possible  to 
make  a  final  classification  or  rejection  without  computing  all  K(K~l)/2  pair¬ 
wise  decisions.  The  optimal  strategy  which  produces  the  fewest  number  of 
tests  and  therefore  insures  the  fastest  throughput  would  operate  as  follows; 
first,  all  of  the  pairwise  tests,  say  I  vs  J  (Tjj)  would  be  ordered  in  accord 


with  the  class  probabilities,  (i.  e. , 


rank  order  Tjj  such  that  T^  >•  Trq  ) 
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iff 


p 

\i 


Ft  >  P 


R 


PT  >  P. 


j  'Q 

four  class  problem: 
ordered 


I  i  R 


I  =  R 


Suppose  the  ranking  were  as  follows  for  a 


pi> 


;  the  pairwise  tests  would  then  be 


2>p3> 

1/2,  1/3,  1/4,  2/3,  2/4,  3/4.  The  first  test  would  be  1  vs  2.  The 
next  test  will  be  determined  on  the  basis  of  the  outcome  of  the  preceding 
test.  For  example,  suppose  the  vote  goes  to  class  1.  In  this  case,  class  2 
cannot  receive  the  maximum  number  of  votes  (i.  e. ,  (K-l)  =  3)  and  thus  the 
only  classes  in  contention  are  1,  3,  and  4.  Therefore,  we  would  select  the 
next  highest  probability  pair  not  involving  class  2.  In  this  example,  1  vs  3 
would  be  chosen.  Now  suppose  that  class  3  receives  the  vote,  in  which  case 
neither  1  nor  2  can  win.  Repeating  the  above  procedure,  the  next  highest 
probability  pair  not  involving  classes  1  or  2  is  the  3  vs  4  test.  Suppose  the 
outcome  of  this  test  is  a  vote  for  class  3.  Now  we  know  that  classes  1,  2, 
and  4  cannot  be  winners,  however,  we  do  not  know  if  class  3  is  the  winner 
without  checking  to  see  if  class  3  receives  the  maximum  number  of  votes. 
Thus,  we  would  next  perform  the  2  vs  3  test.  If  3  wins,  then  the  final 
decision  is  3,  otherwise  a  rejection  is  issued.  Notice  in  this  case  4  tests 
were  required.  In  general,  the  table  below  lists  the  number  of  tests 


required  assuming 


pi^ 


>  P. 


K* 


True  Class 


#  of  Tests  Required 


1 

2 

3 

4 

5 


K  -  1 
K  -  1 
K 

K  +  1 
K  +  2 


K 


2K  -  3 


Table  4  -  1 

The  expected  number  of  tests  is  given  by 

K 

T  =  Pj  [k  -  1  ]  +  Y,  PI  LK  +  1  -  3] 

1=2 
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Notice,  if  all  the  classes  were  equally  likely,  then  PT  =  1/K  and 


T 


K 


(K  -  1)  -5-  (K  -  1)  (K  -  3)  -r 

I 

l 


1/K 


1_ 

K 


|(K  -  2)  -T  (K  -  1}(K  -  3)  4 


K(K-rl) 
2 


] 


Pj  =  1/K 


It  is  interesting  to  compare  " 
wi3e  tests  in  order  to  appreciate 


Pj  =  1/K  to  toe  total  number  of  pair- 
the  significance  of  the  potential  savings. 


K 


Total  Tests  =  K(K-l)/2 


2  1  1 

3  2. 33  3 

10  12.6  45 

36  50.5  630 


Notice  that  for  36  classes  the  expected  reduction  in  computation  time  might 
be  in  the  order  of  80% 

The  above  strategy  could  be  extended  to  the  case  where  the  entire 
pairwise  logic  cannot  fit  into  core  at  the  same  time.  In  this  case,  v.Te  must 
consider  the  access  time  required  to  retrieve  the  pairwise  logic  from  the 
mass  storage  device.  The  optimal  strategy  must  not  only  consider  the 
number  of  tests  but  also  the  time  to  retrieve  the  test  from  storage.  The 
strategy  actually  used  during  the  independent  test  of  our  alpha-numeric  logic 
was  of  this  nature  and  is  described  in  Vote  Assignment,  Vote  Tally,  and 
Logic  Select  Procedures  in  Section  7  of  this  report. 
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SECTION  5 


REJECT  STRATEGY  AND  ERROR  ANALYSIS 


As  described  previously,  the  decision  logic  operates  by  a  vote 
counting  procedure.  In  order  to  investigate  the  utility  of  reject  strategies 
based  on  the  number  of  votes  the  winning  class  received,  the  results  of  fne 
independent  test  were  analyzed.  Hie  output  of  the  TWO  program  of  the 
AVT.P  winch  performed  the  independent  test  listed  the  maximum  number  of 
votes  each  character  received  and  indicated  winch  class (es)  received  that 
number  of  votes. 

The  reject  strategy  which  resulted  in  the  lowest  number  of  rejects 
is  to  simply  reject  a  character  in  the  case  of  ties.  Thus,  if  the  class  which 
received  the  maximum  number  was  unique  and  matched  the  true  identity 
class  of  the  character,  a  correct  decision  resulted;  if  the  class  which 
received  the  maximum  number  of  votes  was  unique  and  was  not  the  true 
class  identity,  a  substitution  occurred;  if  more  than  one  class  received  a 
maximal  number  of  votes  the  character  was  rejected.  This  strategy. 
Strategy  D,  may  be  considered  minimal  since  there  was  no  way  (based  on 
vote  counting)  of  making  a  logical  decision  between  two  or  more  classes 
all  of  which  received  a  maximal  number  of  votes,  and  thus  the  character 
must  be  designated  a  reject. 

Since  there  were  36  classes  in  the  alpha-numeric  set,  the  most 
votes  a  character  could  receive  for  any  one  class  was  35.  For  example, 
the  class  D  could  gain  one  vote  if  each  of  the  35  pairwise  tests  A/D,  B/D, 
C/D,  E/D,  F/D,  Z/D,  0/D,  ...»  9/D  was  decided  in  favor  of  D.  By 

insisting  that  the  class  have  at  least  35  votes  and  rejecting  in  case  of  ties, 
we  obtain  reject  strategy  A.  By  replacing  35  by  34  and  33  respectively 
in  the  above  sentence,  we  have  the  definitions  of  reject  strategies  B  and  C 
respectively. 

The  error  and  reject  rates  on  the  independent  test  vising  strategies 
A,  B,  C,  and  D  are  indicated  in  Figure  5-1.  As  we  would  expect,  the 
errors  decrease  in  the  order  D,  C,  B,  A,  and  the  rejects  increase  in  the 
order  D,  C,  B,  A,  since  a  character  which  is  rejected  using  strategy  D 
will  surely  be  rejected  using  strategy  C,  a  character  rejected  using  strategy 
C  will  be  rejected  using  strategy  B,  etc.  That  is,  if  the  maximum  charac¬ 
ter  class  is  tied  it  certainly  satisfies  the  criteria  of  receiving  less  than  33 
votes  or  being  tied;  if  the  maximum  character  class  receives  less  than  33 
votes  or  is  tied  it  certainly  receives  less  than  34  votes  or  is  tied,  etc. 
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The  zsazarEggn  substitution  rata  of  9-  14%  using  strategy  A  at  first 
seemed  higher  fean  eaxsectad.  Analysis  erf  fee  substitution  table  (Figure  5-2) 
however,  revealed  feat  certain  character  pairs  have  inordinately  high  sub¬ 
stitutions  between  fega.  These  “coafedm  pairs”  account  for  a  large  portion, 
of  fee  substitution.  Figure  5-3  lists  fee  most  commonly  confused  character 
pairs  and  fee  number  of  characters  In  confusion  using  reject  strategies  A, 

3,  C,  ’where  fee  number  of  characters  in  confusion  for  pair  z,  y  is 
defined  to  be  fee  number  of  characters  of  true  class  r  whose  decision  class 
was  y  plus  fee  number  of  characters  of  true  class  y  whose  decision  class 
was  x  . 


As  expected  O’s  and  0’s,  S's  and  5ss,  25s  and  Z's,  G*s  and  6‘s, 

I*s  and  l's,  V’s  and  ITs,  and  B’s  and  8’s  head  fee  list.  Discounting  just 
these  confusion  pairs  reduces  fee  substitution  rate  (using  reject  strategy  A) 
from  9. 14%  to  3%.  Discounting  fee  rest  of  fee  confusion  pairs  in  Figure  5-3 
reduces  fee  substitution  rate  to  1%.  A  review  of  Figure  1-1,  Section  1,  is 
all  feat  is  needed  to  appreciate  why  these  pairs  are  in  confusion.  The 
characters  contained  in  Figure  5-4  through  Figure  5-6,  taken  from  fee  test 
set  data,  further  illustrate  why  certain  character  pairs  are  in  confusion. 
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Confusion  with  Reject  Strategy 


Pair 

A 

B 

C 

0/0 
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S/5 

56 

59 

59 

Z/2 

48 
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Figure  5-3 
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SECTION  6 
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DATA  BASE  EDITING  PACKAGE 


In  an  attempt  to  improve  the  design  of  the  decision  logic,  PAR 
elected  to  develop  a  data  base  editing  package.  This  package  was  used  to 
prepare  a  data  base  suitable  for  logic  design.  In  addition  to  correctly- 
labeling  mis-labeled  characters,  certain  noise  elimination  operations  were 
implemented.  Specifically,  rows  or  columns  of  noise  caused  by  an  inter¬ 
mittent  ,Jwrap  around”  effect  of  the  camera  were  eliminated,  as  well  as 
stray  noise  bits.  In  all  cases  the  noise  was  separate  from  the  character 
so  that  these  eliminations  did  not  alter  the  shape  of  the  character.  In  addi¬ 
tion,  characters  which  were  so  malformed  as  to  be  humanly  illegible  were 
deleted.  This  was  essential  to  good  logic  design  since  illegible  characters 
as  represented  in  the  feature  space  have  the  same  deleterious  effect  as  noise 
on  the  design  of  optimal  discriminant  boundaries. 

It  should  be  noted  that  no  editing  was  performed  on  the  test  data 
since  any  such  editing  would  be  counter  to  the  objective  of  testing  the  sys¬ 
tem  on  "live"  unconstrained  characters. 

The  data  base  editing  package  was  implemented  on  the  CDC-1604 
using  the  BR-85  as  the  inspection  medium.  To  reduce  the  programming 
effort  to  a  minimum,  the  CDC-1604  operating  system  and  various  existing 
routines  developed  by  PAR  for  the  manipulation  of  the  BR-85  display  were 
employed. 

The  input  to  the  editing  programs  is  a  tape  containing  a  digitized 
version  of  each  character  scanned  by  the  Image  Dissector  and  written  by 
the  PDP-8  at  a  density  of  800  BPI.  Since  the  CDC-1604  requires  an  input 
density  of  not  more  than  556  BPI,  PAR  has  written  a  simple  program  which 
is  operable  on  the  Honeywell  645  to  generate  a  tape  with  identical  format 
and  with  an  acceptable  density.  It  is  this  converted  tape  that  is  used  as 
input  to  the  edit  package. 

The  remainder  of  this  section  -will  discuss  the  edit  function  offered 
to  the  operator  (Section  6.  1. ),  the  computer  set  up  and  run  deck  construction 
to  activate  the  package  (Section  6.2.)  and  program  flow  charts. 


I 
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6.  1.  EDIT  FUNCTION 

Each  function  made  available  to  the  operator  may  be  executed  by 
selecting  and  depressing  the  appropriate  function  key;  1-30  are  used.  The 
following  describes  each  key  and  the  associated  function: 


4S 


Key  1: 


Key  2: 
Key  3: 


Key  4: 
Key  5: 


Key  6: 
Key  7: 


Key  8: 


Read  a  character  from  the  input  tape. 

This  operation  will  read  the  next  character  from  the 
input  tape,  matrix  size  is  36  x  30,  and  display  it  on 
the  BR-85.  In  addition,  the  character  and  author 
number  are  displayed  over  the  character  matrix. 

Not  Used 

Backspace  the  input  tape  one  character. 

On  occasion  the  operator  finds  the  need  or  desire  to 
examine  the  character  just  processed;  this  function 
allows  the  repositioning  of  the  input  tape  to  accom¬ 
modate  this  condition. 

Not  Used 

Write  End-Of-File  on  output  tapes  on  logical  units  4, 

5,  and  6  (an  example  of  the  tape  setup  is  shown  on 
page  6-5). 

At  the  completion  of  each  editing  session  the  operator 
will  write  EOF  on  each  of  the  output  tapes  by  selecting 
Key  5;  and  in  addition,  the  selection  of  this  key  will 
cause  the  system  to  punch  a  card  with  the  character 
ID  and  author  number  of  the  last  character  processed. 
This  card  provides  a  means  of  automatically  position¬ 
ing  the  input  tape  to  last  character  processed  in  sub¬ 
sequent  editing  sessions. 

Not  Used 

Delete  a  row  of  noise. 

This  function  is  used  to  delete  all  data  points  in  the 
row  selected  by  the  light  gun.  Action:  (a)  Depress 
Key  7;  (b)  Select  any  data  point  in  the  row  to  be 
deleted  using  the  light  gun. 

Delete  a  column  of  noise. 

To  delete  all  data  points  in  a  given  column  depress 
Key  8  and  select  any  data  point  in  the  desired  column 
with  the  light  gun. 
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Key  9: 


Key  10: 


Key  11: 


Key  12: 
Key  13: 


Key  14: 


Key  15: 

Key  16: 


-47 

Delete  one  data  point  of  noise. 

Any  given  point  of  noise  may  be  deleted  by  depressing 
Key  9  and  selecting  the  point  to  be  deleted  with  the 
light  gun. 

Delete  column  one  of  the  character  matrix. 

To  delete  column  one,  depress  Key  10.  Due  to  a 
programming  error  in  the  PDP-8  character  matrix 
generation  program,  column  one  and  two  frequently 
contain  erroneous  data  points.  This  special  function 
was  included  to  speed  the  editing  phase  (Key  15  may 
be  used  to  delete  column  two). 

Change  character  ID. 

The  PDP-8  character  matrix  generation  program 
assigns  a  character  ID  in  accord  with  the  position  of 
the  character  on  the  input  form.  If  the  physical 
character  displayed  on  the  BR-85  disagrees  with  the 
assigned  character  ID  the  operator  may  change  the 
ID  by  depressing  Key  11  and  depressing  the  correct 
ID  on  the  BR-85  keyboard. 

Not  Used 

Restore  an  erroneous  delete. 

After  the  completion  of  a  delete  operation,  the  user 
may  restore  the  row,  column,  or  point  deleted  by 
depressing  Key  13.  This  operation  will  restore  only 
the  last  deletion  made. 

Set  output  author  number  (ID) 

The  number  system  employed  by  the  PDP-8  program 
and  that  used  by  the  FEATURE  EXTRACTION  and 
OLPA.RS  programs  are  inconsistent.  This  function 
compensates  for  that  inconsistency. 

Delete  column  two  of  the  character  matrix. 

See  description  of  Key  10. 

Not  Used. 
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Key  17: 


Key  18  & 
Key  20: 


Keys  21, 
22,  &  23: 

Key  24: 


Key  25: 

Key  26, 
27,  k  28: 


Sort  Edited  Alpha /Numeric  tape  (physical  unit  2). 

This  function  is  used  to  extract  characters  from  the 
tape  on  logical  unit  10  and  write  them  on  logical 
unit  11  if  numeric,  on  logical  unit  12  if  the  character 
ID  is  an  alpha,  and  on  logical  unit  13  if  it  is  a  special 
symbol. 

19:  Not  Used 

Write  an  edited  character  on  physical  unit  16. 

The  characters  written  by  this  function  are  of  a  very 
distorted  contour  but  are  still  recognizable.  They 
are  not  included  in  the  initial  design  of  the  decision 
logic  but  instead  are  included  in  a  subsequent  redesign 
of  the  decision  logic.  The  function  is  activated  by 
depressing  Key  20. 

Backspace  physical  unit  4,  5,  or  6. 

Backspace  one  record  (character)  on  the  selected 
physical  unit;  it  is  activated  by  depressing  Key  21, 

22,  or  23. 

Find  a  character  on  the  input  tape. 

This  function  allows  the  operator  to  resume  proces¬ 
sing  from  the  last  character  processed  in  the  previous 
editing  session.  The  function  requires  that  the  card 
punched  when  Key  5  (write  EOF  on  units  4,  5,  6)  was 
activated  be  in  the  card  reader  following  the  "EXECUTE" 
card.  When  Key  24  is  depressed,  the  program  will 
read  the  card  in  the  reader  and  read  the  characters 
from  the  input  tape  (unit  2)  until  a  match  on  charac¬ 
ter  ID  and  author  ID  has  been  found. 

Not  Used. 

Skip  to  end-of-file  on  units  4,  5,  or  6  (the  output 
tapes). 

These  functions,  when  selected,  will  advance  the 
selected  tape  to  the  last  record  previously  written 
on  the  tape. 
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Key  29:  Not  Used. 


} 


Key  30:  Write  the  character  on  physical  4  and  if  an  alpha 

character  on  physical  5. 


I 

I 

r 

t 
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When  the  operator  decides  a  character  is  acceptable 
for  inclusion  in  the  data  base,  he  will  depress  Key  30. 
The  resulting  operation  is  to  write  the  character  on 
physical  unit  4  regardless  of  nature  (alpha /numeric) 
and  if  the  character  is  alpha  it  will  also  be  written 
onto  physical  unit  5. 

Note:  The  two  tape  options.  Key  20  and  Key  30,  will  effect  a  write  to  the 
appropriate  unit  and  will  automatically  read  the  next  character  on 
the  input  tape  (physical  2). 


Tape  Setup 


Tape  Unit 


Physical 

Logical 

Description 

1 

System  master 

2 

10 

Input  character  tape 

3 

Edit  overlay  tape  (binary) 

4 

11 

Alpha/Numeric  output  tape 

5 

12 

Symbols  output  tape 

6 

13 

Difficult  character  output  tape. 

6 


5 


PROGRAM:  TAG  START 


Purpose:  This  is  the  EDIT  package  program;  it  connects  the  subroutine 

entry  point  addresses  with  the  appropriate  keys  on  the  BR-85. 


*  -  There  is  no  return  from  DOCTJS;  control  is  transferred  to  the  first 
key  program  (Key  #1  -  "CONTROL"). 
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SUBROUTINE:  CONTROL 

Par  pose:  CONTROL  will  read  one  character  from  the  input  tape 

(physical  2)  and  display  the  character  matrix  on  the  BR-85. 
In  addition  it  extracts  and  displays  the  character  and  author 
ID. 
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no  return 
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SUBROUTINE:  TIMES3 

Purpose:  Extract  character  data  bit  from  input  buffer  and  move  it  to 

the  process  buffer  (3  bits  at  a  time) 
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SUBROUTINE:  SETDISP 

Purpose:  Insert  BR-85  control  and  position  character  for  one  display 

line  in  the  process  buffer. 


etu: 


SUBROUTINE:  SETHREF 
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Purpose:  Used  in  construction  of  a  character  matrix.  This  matrix  is 

generated  from  the  BR-85  display  buffer  which  contains  all 
modifications  made  by  the  operator  (3  bits  at  a  time). 


START 


rUNCTION  PKBIT 
set  the  bit  in  out¬ 
put  buffer  =  0  if 
blank;  or  1  if  * 


jits  checkec 
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SUBROUTINE:  SKIP  11,  SKIP12,  SKIP13 

'Purpose:  Advance  the  appropriate  tape  to  the  EOF  and  backspace  over 

the  EOF  mark. 


* 

no  return 


SUBROUTINE:  MB  OB 


Purpose: 


This  is  a  dummy  routine  and  should  never  be  entered.  Its 
function  is  to  return  to  .DOCUS  if  an  illegal  function  key  is 
depressed  and  honored. 


60 


SUBROUTINE:  HARD 

Purpose;  Reconstruct  a  character  matrix  from  the  BR-85  display 

image  and  write  it  on  physical  unit  6. 


no  return 
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SUBROUTINE:  RESTORE 
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Purpose;  Restore  the  last  deletion  (row,  column,  point)  made  by  the 
operator.  When  the  operator  selects  any  of  the  delete  keys 
the  display  image  is  read  into  two  buffers;  one  buffer  is 
saved  unchanged  and  the  other  is  U3ed  to  make  the  modifica¬ 
tion.  The  modified  buffer  is  then  transferred  to  the  BR-85. 
When  this  program  is  activated  it  transfers  the  unchanged 
buffer  to  the  BR-85. 


* 

no  return 
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SUBROUTINE:  DPOINT 

Purpose:  Set  parameter  MIND=3  to  indicate  a  delete  point  operation 

is  to  be  executed. 


* 

no  return 


SUBROUTINE:  DCOL 
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Purpose: 


Set  parameter  MIND=2  to  indicate  a  delete  column  operation 
is  to  be  executed. 


no  return 
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SUBROUTINE:  OROW 


Purpose:  Set  parameter  MIND=1  to  indicate  a  delete  row  operation  is 

to  be  executed. 


no  return 


SUBROUTINE:  TBACK 


* 

no  return 


SUBROUTINE:  NEWTAPE 


66 

Purpose:  Sort  the  input  tape  (physical  2)  and  write  the  numerics  on 

physical  4,  the  alpha  characters  C,  X,  T,  X,  and  Z  on 
physical  5,  and  special  symbols  on  physical  6. 


! 


SUBROUTINE:  OUTCHAR 


o  c 


Purpose:  Retrieve  the  current  BR-85  display  image  of  the  character 

in  process  and  reconstruct  the  character  matrix.  This  format 
is  identical  to  the  PDP-8  image  dissector  matrix,  however 
it  reflects  any  changes  made  by  the  operator. 


SUBROUTINE:  SETA 


Purpose:  Set  the  author  number  to  desired  value.  This  operation  per¬ 

mits  the  operator  to  set  the  author  number  to  be  assigned  to 
the  next  output  character  written  on  either  physical  4,  5, 
or  6. 


START 


/READ  A 
/card  FOR- 

NMAT  (5R1) 
\Cols  1-5  _ 


Set  JNUM(l) 

-  JNUM(5)  fr 
output  card. 


KRTRN(6) 


no  return 


SUBROUTINE:  WEOF 
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Purpose: 


Write  EOF  on  the  three  output  tapes  (physical  4,  5,  and  6), 


STARTj 


Write  2  EOFs! 
on  physical 
4,  5,  6 


Punch  a  card  con¬ 
taining  the  char. 

&  author  ID  of  th  ! 

I  fist  char,  processec 
jnl  input  tapes  (physl.  2 


Rewind 

4,5.6 

PAUSE  7777 

_ 

KRTRN(6) 


no  return 
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SUBROUTINE:  CHARRD 


Purpose:  Read  a  character  from  the  input  tape  (physical  2) 


SUBROUTINE:  CHARWRT 
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Purpose:  Write  the  current  BR-85  display  image  on  the  appropriate 

output  tape(s). 


* 

no  return 
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SUBROUTINE:  DONE 


SUBROUTINE:  DTWO 
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Pot  pose:  Set  parameter  MIND=5  to  indicate  that  row  two  is  to  be  deleted. 


no  return 


SUBROUTINE:  DELETEX 
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Purpose:  This  program  will  identify  fee  delete  operations  selected  by 

the  operator,  retrieve  the  current  BR-85  display  image, 
make  the  appropriate  changes  to  that  display  and  transfer 
the  new  display  image  to  the  BR-85. 


/ - 

/ 


A 


6  -  JC 


co  rattan 


6  -  jj 


The  following  routines  were  written  for  OLPARS  and  are  docu¬ 
mented  m  Final  Report  to  Contract  F30602-71-C-0367,  RADC-TR-72-71, 
March  1972: 


WRTDISP 

KRTRN(6) 

DOCUS 

ALFDISP 

READISP 

GUNLOOP 

FRAMER 

CHKBIT 

PKBIT 


6.2. 


COMPUTER/RUN  DECK  SET  UP 


77 


6.2.1.  Density  Conversion 

The  input  tape  to  the  Edit  package  is  a  556  BPI  tape  with  one 
charactei  per  physical  record.  This  tape  is  generated  on  the  645  by  using 
the  original  character  generated  on  the  PD  P-8  at  800  BPI. 


The  following  program  will  affect  the  645  density  conversion: 

$  IDENT  /ISC-P,  NAME  ,  OCR  ,00331,5581 

$  OPTION  FORTRAN 

$  FORTRAN  LSTOU 

$  INCODE  IBMF 

DIMENSION  M(64) 

CALL  FLGEOF(10,  NE) 

CALL  FXOPT(40,  1, 1,  0) 

10  READ  (10)M 

IF  (NE.  EQ.  1)  GO  TO  300 
WRITE  (11)M 
GO  TO  10 

300  CALL  FCLOSE(IO) 

END  FILE  11 

STOP 

END 

$  EXECUTE  DUMP 

$  LIMITS  23,  11K 

$  FFILE  10,  NBUFFS/2,  BUFSIZ/67,  MLTFIL,  FIXLNG/164,  NSTDLB,  NOSRLS 
$  TAPE  J  0,  AID, ,  NAME 

$  FFILE  1 1 ,  NBUFFS/2,  BUFSIZ  /67,  MLTFIL,  FIXLNG/64,  NSTDLB,  NOSRLS 
$  TAPE  11,  BID,,  99999 
$  ENDJOB 

***EOF 


Note:  In  the  abeve  program  tape  unit  10  is  the  input  and  unit  11  is  the 

output  tape.  The  "JOB11  card  must  specify  a  density  of  800  BPI  for 
the  input  and  556  BPI  for  the  output  tape. 


6.  2.  2.  Edit  Package 

Tape  Assignment 

To  exercise  this  package  the  following  tape  assignments  are 
required: 
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Tape  Unit 


Physical  Logical 


1 

Z 

3 

4 

5 


1 

10 

3 

11 

12 

13 


Description 

System  master  with  punch 
Input  character  tape 
Edit  overlay  tape 
Alpha/numeric  output  tape 
Symbol  output  tape 
Difficult  character  output  tape 


Run  Deck  Setup 

(#  is  a  7,  9  punch  in  column  1) 

*BEGIN  JOB  EDIT 

*COOP,  116,  EDIT,  I/10/S/1 1/12/13/56,  60,  99999q4,  EDIT 
^EXECUTE, ,  56. 

BLANK  CARD  (2  cards) 


may  be  either  a  blank  cord  or  it  may  contain  the  character  and 
author  ID  generated  by  the  operator  in  a  previous  editing  run  when  he  selec¬ 
ted  Key  5  -  WRITE  EOF  ON  OUTPUT  TAPES  to  terminate  processing. 


o 


34 
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PDP-8  CHARACTER  DIGITIZE  -  This  program  (provided  by  RADC) 

controls  the  IMAGE  DISSECTOR  (ITT)  and  converts  each  hand¬ 
printed  character  to  a  matrix  of  36  rows  by  30  columns. 

A  character  ID,  determined  by  the  character's  position  on 
the  input  form,  and  an  author  number  is  assigned,  added  to 
the  digitized  matrix  and  written  on  the  output  tape. 

Figure  -1  is  a  sample  form  used  in  the  collection  of  data. 

635  TAPE  DENSITY  CHANGE  -  Since  the  PDP-8  has  a  fixed 

density  of  800  BPI  and  the  CDC-1604  requires  a  density  of 
either  256  or  556  BPI,  this  program  is  required.  The  only 
function  performed  is  a  density  change,  matrix  size  and 
organization  are  not  affected. 

CDC-1604  EDIT/SORT  -  The  EDIT  program  is  provided  to  compensate 
for  human  and  hardware  error  and  thereby  possibly  save  a 
character  that  otherwise  would  have  been  deleted  from  the 
data  base.  Some  typical,  problems  that  may  be  resolved  are: 

a.  Mislabeled  characters  -  an  "A"  character  written  on 
the  form  in  a  box  where  a  "B"  should  have  been, 

b.  Camera  noise  indicating  a  non-existent  data  point. 

c.  An  incomplete  erasure  with  a  ch?-  erwrite. 

d.  Character  overwrite  without  an 

e.  A,  character  written  so  lightly  that  t.  atrix  is 
incomplete. 

The  1604  SORT  program  processes  a  tape  with  a  random 
sequence  of  characters  and  produces  an  output  tape  with  all 
characters  of  a  given  class  (class  A,  B,  1,  2,  etc. )  grouped 
in  consecutive  records. 

635  MATRIX  REDUCTION  -  This  routine  is  necessitated  by  the 

conflict  in  the  matrix  size  (36  x  30)  of  a  character  digitized 
by  the  PDP-8  and  the  expected  matrix  size  (24  x  16}  of  the 
FEATURE  EXTRACTION  program.  The  various  algorithms 
in  the  FEATURE  EXTRACTION  subsystem  were  designed 
using  a  matrix  of  24  x  16;  any  variation  in  this  matrix  size 
■would  render  these  algorithms  useless. 

635  FEATURE  EXTRACTION  -  These  algorithms  convert  a 

digitized  character  matrix  into  an  OLPARS  vector  that,  when 
in  correct  format,  may  be  used  for  either  character  recog¬ 
nition  design  or  character  evaluation. 
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635 


DATA  CONVERSION  -  The  input  tape  to  the  Edit  package  is 
a  556  BPI  tape  with  one  character  per  physical  record.  This 
tape  is  generated  on  the  645  by  using  the  original  character 
generated  on  the  PDP-8  at  800  BPI. 


PHASEONE 


PHASONEA 


TWO 


The  PHASEONE(Pl)  package  computes  an  array  of  sums  and 
a  matrix  of  squared  sums  for  each  character  for  which  logic 
is  desired.  The  input  to  PI  consists  of  a  series  of  feature 
vector  tapes  generated  on  the  Honeywell  635  computer  as 
described  previously.  The  output  from  PI  is  up  to  three  (3; 
sum/sum  square  (S/SSQ)  tapes  which  are  used  as  input  to  the 
PHASONEA(PIA)  package. 

The  PHASONEA  (P1A)  package  computes  the  logic  for  all 
pairs  of  classes  in  the  design  character  set.  Logic  is  designed 
for  one  look  pair  (TR,  TB,  TL,  RB,  RL,  or  BL)  for  each 
pair  of  classes  (36  classes  results  in  630  class  pairs)  for  each 
of  the  9  sort  classes.  Therefore,  the  decision  logic  for  36 
classes  contains  5670  (9  x  630)  decision  points.  The  input  to 
P1A  consists  of  the  set  of  up  to  3  S/SSQ  output  tapes  from  PI 
and  a  deck  of  cards  consisting  of  one  card  for  each  class  pair 
with  the  pair  looks  to  be  used  for  that  class  pair.  The  out¬ 
put  is  a  logic  tape  containing  one  record  for  each  class  pair 
and  a  printout  of  the  logic.  P1A  may  be  started  at  any  desig¬ 
nated  class  pair  and  terminated  after  any  desired  class  pair. 

A  set  of  utility  programs  for  merging  a  series  of  logic  tapes 
generated  by  P1A  is  provided  in  the  utility  package  (a  complete 
logic  tape  for  36  classes  will  consist  of  630  records). 

The  TWO  package  evaluates  the  vectors  on  the  Standard  Input 
Tape  (SIT)  against  logic  generated  by  the  P1A  package. 
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SECTION  7 

ALPHA  NUMERIC  LOGIC  PACKAGE  (ANLP) 

The  ANLP  performs  the  generation  of  discrimination  logic,  pre¬ 
liminary  evaluation,  and  independent  testing  of  handprinted  alpha-numeric 
characters.  Three  major  program  packages  have  been  developed  to  perform 
these  tasks  along  with  a  small  set  of  utility  programs  for  tape  editing.  Each 
set  of  programs  will  process  up  to  48  separate  characters;  however,  for  the 
current  contract  only  the  10  numerics  and  26  alpha  characters  have  been 
processed.  Each  program  will  be  described  in  the  following  format:  (!) 
an  introduction  to  the  purpose  and  general  operation  of  the  program;  (2)  a 
description  of  the  input  cards  and  tapes,  output  cards  and  tapes,  and  general 
run  time  instructions;  (3)  a  listing  of  possible  program  pauses  and  correc¬ 
tive  measures  (if  possible);  and  (4)  a  functional  flow  diagram  of  the  program 
package. 


To  generate  and  test  logic  via  the  ANLP  package,  then,  requires 
the  user  to  operate  the  following  program  packages  in  sequence:  PHASEONE, 
PHASONEA,  TWO;  where  PHASEONE  computes  a  measurement  sum  array 
and  scatter  matrix  for  each  possible  look  pair  from  the  Standard  Input  Tape 
(SIT),  PHASONEA  utilizes  PHASEONE  output  and  the  pair  look  deck  to 
generate  a  final  logic  tape,  and  TWO  evaluates  the  logic  against  vectors 
on  any  SIT.  Two  basic  inputs,  then,  are  required  by  ANLP:  The  SIT  and 
the  Pair  Look  Deck,  Page  A-8,  A-9,  A-13,  A-14,  A-15/A-16. 


7.  1.  PHASEONE 

The  PHASEONE(Pl)  package  computes  an  array  of  sums  and  a 
matrix  of  squared  sums  for  each  character  for  which  logic  is  desired.  The 
input  to  PI  consists  of  a  series  of  feature  vector  tapes  generated  on  the 
Honeywell  635  computer  as  described  previously.  The  output  from  PI  is 
up  to  three  (3)  sum/sum  square  (S/SSQ)  tapes  which  are  used  as  input  to 
the  PHASONEA(PIA)  package. 

OLPARS  Alpha-Numeric  Data  Tape  Format 

The  data  lape  for  the  ANLP  package  (PI  and  TWO)  is  represented 
symbolically  as  follows: 


7  -  1 


Bottom  view  measurements,  21  nvimbers,  3  characters /number 

Right  view  measurements,  21  numbers,  3  characters /number 

Top  view  measurements,  21  numbers,  3  characters  /number 

Left  view  measurements,  21  numbers,  3  characters/number 

Additional  measurements,  26  numbers,  3  characters /number 

Character  Symbol 

Sequence  number 

Number  of  bottom  convexities 

Number  of  right  convexities 

Number  of  top  convexities 

Number  of  left  convexities 
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The  input  tape  from  the  635  will  be  a  BCD  tape.  Each  vector  on 
the  tape  will  consist  of  three  BCD  records,  each  record  consisting  of  120 
decimal  characters.  Each  vector  is  broken  into  three  parts:  (1)  standard 
measurements,  (2)  additional  measurements,  and  (3)  identification  data. 

(1)  Standard  Measurements 


The  standard  measurements  occupy  the  first  252  decimal 
characters  of  each  vector.  Each  measurement  occupies  three  decimal 
characters  and  includes  a  sign  of  two  decimal  digits.  The  total  of  84  stan¬ 
dard  measurements  are  divided  into  four  equal  subgroups  of  21  measure¬ 
ments  each.  The  four  subgroups  represent  the  four  "looks''  used  to 
measure  the  character;  namely,  bottom,  right  side,  top  and  left  side,  in 
that  order.  Each  subgroup  is  further  divided  into  five  convexities.  The 
first  convexity  pertains  to  the  first  five  measurements  of  the  subgroup, 
the  second  through  fifth  convexities  each  pertain  to  four  measurements, 
making  a  total  of  21  measurements  in  all.  The  subgroups  always  contain  21 
measurements  even  when  only  three  or  one  convexities  are  actually  supplied. 
If  one  convexity  is  supplied,  it  will  occupy  the  first  five  measurements  of 
the  subgroup,  with  the  remaining  16  measurements  set  to  zero.  If  three 
convexities  are  supplied,  they  will  occupy  the  first  13  measurements  of  the 
subgroup,  with  the  remaining  8  measurements  set  to  zero.  Each  of  the  four 
subgroups  will  always  contain  at  least  one  convexity. 

(2)  Additional  Measurements 


Following  the  252  decimal  characters  which  contain  the 
standard  measurements  are  78  decimal  characters  which  contain  room  for 
26  additional  measurements.  Each  measurement  occupies  three  decimal 
characters  and  includes  a  sign  and  two  decimal  digits.  Eight  additional 
measurements  were  calculated  from  the  character  to  further  help  recogni¬ 
tion.  The  number  of  additional  measurements  to  be  used  for  each  character 
is  supplied  by  the  pair-look  parameter  deck.  The  eight  additional  features 
supplied  for  each  character  occupy  the  first  eight  additional  measurement 
positions  for  this  vector  with  the  remaining  13  measurements  set  to  zero. 

(3)  Identification  Data 

Following  the  330  decimal  characters  which  contain  the 
standard  and  additional  measurements,  are  ten  decimal  characters  which 
contain  identification  data  pertaining  to  this  vector.  The  first  character 
of  the  identification  data  contains  the  actual  character  in  question.  The 
next  5  characters  contain  the  author  ID,  as  four  digits  and  a  sign.  The 
remaining  four  characters  contain  four  decimal  digits  which  partain  to  the 
four  looks  described  above.  Each  digit  contains  either  a  1,  3,  or  5,  depend¬ 
ing  upon  the  number  of  convexities  contained  in  the  corresponding  subgrouD 
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within  the  standard  measurement  portion  of  this  vector.  The  first  digit 
corresponds  to  the  first  subgroup,  the  second  digit  to  the  second  subgroup, 
etc.  The  remaining  20  decimal  characters  of  the  vector  are  not  used. 

Pair  Look  Deck  Format 

The  pair-look  parameter  deck  contains  pertinent  information  con¬ 
cerning  all  possible  pairs  for  a  given  635  BCD  input  tape.  Needed  for  each 
possible  pair  is  such  information  as  which  looks  are  to  be  used  when  evalua¬ 
ting  the  pair,  as  well  as  the  number  of  additional  measurements  to  be  used 
for  this  pair  (see  (2)  above).  To  facilitate  easy  updating  of  this  deck,  each 
pair  will  be  represented  by  a  separate  card.  Each  card  will  have  the  fol¬ 
lowing  format: 

Columns 


1  Class  1  character 

2  Class  2  character 

3-7  Blank 

8  Look  1(1=  bottom,  2  =  right  side,  3  =  top,  4  =  left 
side) 

9  Blank 

10  Look  2  (same  as  Look  1) 

11-16  Blank 

17-18  Number  of  additional  measurements  to  be  used  for 
this  pair,  right  adjust. 

For  each  character,  fifty-four  (54)  records  (6  "look"  pairs  of  9 
sort  classes  each)  are  written  on  the  S/SSQtcpes  in  the  following  sequence: 

Look  Pair  #  Sort  Class  #  Record  #'  s 


1 


2 

3 

4 

5 

6 


Locks  TR* 


Looks  TB 
Looks  TL 
Looks  RB 
Looks  RL 
Looks  BL 


Sort  Clas3  5,  5 

5.3 

5.1 

3.5 

3.3 

3.1 

1.5 

1.3 

1,  1 


1 

2 

3 

4 

5 

6 

7 

8 
9 


1-9 


10-18 

19-27 

28-36 

37-45 

46-54 


*  -  T  indicates  top  view,  R  -  right  side,  B  -  bottom,  L  -  left  side 
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Thus,  each  vector  input  to  PI  contributes  to  six  records,  one  for 
each  look  pair.  Each  S/SSQ  tape  contains  the  data  for  up  to  sixteen  (16) 
alpha-numeric  characters.  In  addition,  a  header  record  is  written  on  each 
S/SSQ  tape  containing  the  number  of  classes  to  be  input  and  their  symbols 
in  sequence.  Under  this  contract,  S/SSQ  #1  has  characters  0  -  F  (0, 1,2, 

3, 4,  5,  6,  7,  8,  9,  A,  B,  C,  D,  E,  F);  SSQ  #2  has  G  -  V  (G,  H,  I,  J,  K,  L,  M,  N,  O, 

P,  Q,  R,  S,  T,  U,  V)  and  SSQ  #3  contains  W  -  Z  (W,  X,  Y,  Z).  The  PI  package 
may  operate  in  the  tape  initiate  or  tape  modify  mode.  In  the  initiate  mode, 
the  54  records  for  each  character  are  written  in  the  order  of  character  in¬ 
put.  The  tape  modify  mode  reads  the  previous  S/SSQ  matrix  for  the  charac¬ 
ter  onto  the  data  drum  of  the  1604B,  adds  the  information  from  the  new 
vectors  to  the  various  matrices,  and  outputs  the  modified  records.  In  both 
modes,  a  restart  capability  allows  the  user  to  commence  operation  with  any 
character.  Sense  switch  settings  allow  the  user  control  over  system  wrap- 
up  and  data  printouts.  In  either  mode,  vectors  may  be  eliminated  from 
consideration  by  use  of  a  discard  vector  tape  generated  by  program  MAKE 
(See  Utilities  Section). 

Run  Time  Instructions  for  PHASEONE 

(1)  Compile  and  Execute  the  PI  package. 

(2)  The  program  will  halt  at  Pause  111  (lllg  in  the  A  register 
on  the  1604  console)  to  allow  for  mounting  of  tapes. 

(a)  Mount  a  scratch  tape  on  physical  tape  drive  3  to  be 
used  as  the  output  S/SSQ  tape. 

(b)  Mount  the  appropriate  current  S/SSQ  tape  on  physical 
tape  drive  4  (the  appropriate  S/SSQ  tape  is  that  tape  containing  the  character 
with  which  the  current  run  will  begin  input). 

(c)  Mount  the  appropriate  input  tape  on  physical  tape 

drive  5. 

(d)  Mount  the  Discard  Vector  tape  (if  appropriate)  on 
physical  tape  drive  6. 

(3)  A  number  of  sense  switch  (SS)  settings  are  available  which 
allow  the  user  to  control  program  wrap-up  and  printouts.  To  set  a  sense 
switch  during  program  operation  the  user  may  hit  the  carriage  return,  then 
type  "f,  n/1",  where  n  is  the  sense  switch  number  to  be  set.  The  procedure 
for  setting  and  clearing  sense  switches  is  further  discussed  in  the  COOP 
Users'  Guide,  Page  5-1.  For  the  PI  package,  setting  SSI  will  cause  the 
program  to  complete  its  output  at  the  completion  of  'he  current  data  tape; 
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SS2  will  hav<;  the  same  effect  at  the  completion  of  the  current  data  charac¬ 
ter  input;  SS3  will  suppress:  printout  of  the  feature  values  of  each  input 
vector;  and  SS4  will  suppress  printout  of  the  entire  vector,  heading  as  well 
as  feature  values. 

(4)  Two  data  cards  are  required  for  the  operation  of  PI: 

(a)  A  card  with  column  1  punched  with  the  initial  charac¬ 
ter  to  be  processed  under  the  current  run.  PI  will  search  the  input  tape 
for  this  character  and  will  copy  the  current  S/SSQ  tape  onto  the  new  S/SSQ 
tape  those  characters  which  fall  before  the  initial  character  on  the  appro¬ 
priate  S/SSQ  tape.  Column  2  of  this  card  contains  the  program  mode  indi¬ 
cator.  For  Mode  =  0,  PI  assumes  no  previous  data  output  and  expects  a 
third  input  card  to  follow  card  2.  This  mode  is  to  be  used  only  when 
initializing  a  set  of  S/SSQ  tapes.  When  Mode  =  1,  an  update  procedure  will 
be  performed.  S/SSQ  tape  records  will  be  expected  on  the  current  tape, 
and  will  be  added  to  the  vectors  input  from  the  current  data  input  tape. 

For  Modes  2,  S/SSQ  records  will  be  created  entirely  from  the  input  data 
set  and  the  current  S/SSQ  tape  will  only  be  utilized  to  copy  those  S/SSQ' s 
which  fall  prior  to  the  initial  character  to  be  processed  on  the  current  in¬ 
put  operation. 


(b)  The  second  input  card  signals  the  existence  of  a  Vec¬ 
tor  Discard  tape  on  physical  tape  drive  6  via  a  non-zero  character  in  columns 


1-5. 


(c)  The  third  input  card  (necessary  only  if  column  2  of 
card  1=0)  contains  the  number  of  data  characters  to  be  input  in  the  entire 
design  (Columns  1-2)  and  up  to  48  class  symbols  for  the  data  set  (columns 
3-50).  The  character  punched  m  coluifrrts  30-50  must  be  listed  in  the 
order  of  data  input. 


Pause  Number 

E^ent 

Continuation  Action 

22 

End  of  Tape  on  Tape  4 

Restart 

23 

Parity  on  Tape  4 

Restart 

32 

End  of  Tape  on  Tape  3 

Restart 

33 

Parity  on  Tape  3 

Restart 

77 

Parity  error  on  data 

Continue;  vector  will 

input  tape 

be  skipped 

600 

Program  Entrance 

Mount  tapes  &  continue 

610 

Character  in  col.  1  of 

Restart;  correct 

input  card  1  not  found  in 
class  list 

appropriate  card 

704 

Character  on  input  tape 
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Read  card  3;  Set 
IT  CHARS  =  #  of 
data;  Set  symbols 
class  list  =  char, 
symbols;  IVDT=2 


WW iViitl 


704  End  of  Data  Input  Tape  a.  Mount  another  tape 

and  continue 
b.  Set  sense  switch  I 
and  continue;  Pi 
will  complete  output 

1211  Final  Output  Complete  None 


7. 2.  PHASONEA 

The  PHASONEA  (P1A)  package  computes  the  logic  for  all  pairs  of 
classes  in  the  design  character  set.  Logic  is  designed  for  one  look  pair 
(TR,  TB,  TL,  RB,  RL,  or  BL)  for  each  pair  of  classes  (36  classes  results 
in  630  class  pairs)  for  each  of  the  9  sort  classes.  Therefore,  the  decision 
logic  for  36  classes  contains  5670  (9  x  630)  decision  points.  The  input  to 
PI  A  consists  of  the  set  of  up  to  3  S/SSQ  output  tapes  from  PI  and  a  deck  of 
cards  consisting  of  one  card  for  each  class  pair  with  .'•he  pair  looks  to  be 
used  for  that  class  pair.  The  output  is  a  logic  tape  containing  one  record 
for  each  class  pair  and  a  printout  of  the  logic.  PI  A  may  be  started  at  any 
designated  class  pair  and  terminated  after  any  desired  class  pair.  A  set  of 
utility  programs  for  merging  a  series  of  logic  tapes  generated  by  P1A  is 
provided  in  the  utility  package  (a  complete  logic  tape  for  36  classes  will 
consist  of  630  records).  The  logic  records  are  ordered  in  the  following 
manner  for  n  classes: 


Logic  Tape  Record  Number 

1 - -n-1 

n - 2  n  -  3 

2n  -  2 - 9-  3n  -  5 

• 

(n-2)(n-l)/2  +  (n~2) 

- (n-2(n~l)/2  +  (n-1) 

n.  (n-l)/2 


Class  Pair 

1,  2 — *.l.n 

2,  3  — ^  2,  n 

3,  4 — »-  3.  n 

(n-2)(n- 1 ) - (n-2)(n) 

(n-l)(n) 
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Therefore,  the  logic  records  produced  for  the  36  alpha-numeric 
characters  begin  with  the  character  0  vs.  character  1  logic,  then  0/2, 
through  0/Z,  followed  by  1/2  through  2/Z,  etc.  The  individual  logic 
records  are  formatted  as  follows: 

Word 


Number  of  words  within  the  physical  record 
5,5  sort  class  logic  type  code 

1  =  Decide  class  1  -  There  were  2  or  more  class  one 
vectors  for  this  pair  and  sort  class,  but  1  or  less 
class  two  vectors. 

2  ”  Decide  class  2  -  There  were  2  or  more  class  two 
vectors  for  this  pair  and  sort  class,  but  1  or  less 
class  one  vectors. 

3  =  No  Vote  -  There  were  1  or  less  vectors  in  both 
classes  for  this  pair  and  sort  class 

NDIM  =  number  of  weights  for  a  Fisher  Record  of  1 
discriminant  line. 

N(NDIM)  -  where  N  is  the  number  of  discriminant 
lines.  (If  this  field  is  negative  it  implies  that  the 
Fisher  record  has  been  reversed;  i.  e. ,  the  B/A  logic 
was  constructed  by  reversing  the  A/B  logic). 

5,  3  sort  class  logic  type  code  (see  (5,  5)) 


1, 1  sort  class  logic  type  code  (see  (5,5)) 

Fisher  and  discriminant  logic  for  all  sort  classes  for 
this  pair.  A  Fisher  record  will  consist  of  NDIM 
weights  followed  by  a  threshold.  One  discriminant 
line  will  consist  of  NDIM  weights  followed  by  a 
threshold.  For  two  or  more  discriminant  lines  for  a 
sort  class,  there  will  be  N(NDIM)  weights  followed 
by  N  thresholds,  where  N  is  the  number  of  discrimin¬ 
ant  lines. 

Card  Input  Format 

There  are  two  input  cards  containing  parameters  to  this  program. 
These  cards  will  be  read  at  the  beginning  of  each  run.  The  format  of  these 
cards  is  as  follows: 


10 

II  -  last 
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Card  One 


Columns 

Format 

Description 

1 

-  4 

14 

The  total  number  of  PROGRAM  ONE 
output  tapes  (maximum  10)  (IPO) 

5 

-  8 

14 

Total  number  of  special  OLPARS 
discriminant  tapes  (maximum  10)(IOD) 

11 

-  12 

14 

Total  number  of  characters  (currently  36) 

13 

•  16 

14 

Number  of  sort  classes  per  pair 
(currently  9) 

Run  Time  Instructions  for  PHASONEA 


( 


(1)  Compile  and  execute  the  PI  A  package. 

(2)  The  program  will  halt  at  Pause  1111  (llilgin  the  A  register 
on  the  1604  console)  to  allow  the  mounting  of  tapes. 

(a)  Mount  the  S/SSQ  Tape  #1  on  physical  tape  3 


(b)  Mount  S/SSQ  #2  on  physical  tape  5 

(c)  Mount  S/SSQ  #3  on  physical  tape  4 

(d)  Mount  a  scratch  tape  on  tape  drive  6  to  be  used  as 
the  output  logic  tape. 

(3)  Two  sense  switch  settings  are  available  to  allow  user  con¬ 
trol  over  program  wrap-up  and  printout.  Setting  SSI  will  suppress  the 
output  ox  the  logic  record  printout.  Setting  SS2  will  cause  P1A  to  complete 
processing  following  the  next  logic  record  output. 


(4)  One  data  card  is  required  in  addition  to  the  pair  look  deck 
(which  follows  the  data  card)  for  the  operation  of  P1A. 

(a)  The  start  card  contains  the  initial  pair  of  classes  for 
which  logic  is  to  be  designed  on  the  current  run.  The  highef  indexed  class 
for  the  alpha-numerics,  in  the  array  0123456789ABCDEFGH 
IJKLMNOFQRSTUVWXYZ,  is  punched  in  column  1  and  the 
lower  indexed  claf  s  in  column  2. 
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r 

I 

4 

l 
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Pause  Number 

Event 

Continuation  Action 

104 

End  of  tape  on  S/SSQ  tape 

during  record  skipping 

Hardware  malfunction 

105 

Parity  on  S/SSQ  tape  during 

record  skipping 

Continue 

114 

End  of  tape  on  S/SSQ  tape 

during  read  of  first  class 

Hardware  malfunction 

115 

Parity  on  S/SSQ  tape  during 

read  of  first  class 

Continue  or  restart 

214 

End  of  tape  on  S/SSQ  tape 

during  read 

Hardware  malfunction 

215 

Parity  on  S/SSQ  tape 

during  read 

Continue  or  restart 

504 

End  of  tape  on  S/SSQ  tape 

during  record  skipping 

Hardware  malfunction 

505 

Parity  on  S  /SSQ  tape  during 

record  skipping 

Continue 

11111 

Program  Entrance 

Mount  tapes  ana  continue 

105105 

Program  Completion 

None 
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Yes 


>-#  of  classes 
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TWO 
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The  TWO  package  evaluates  the  vectors  on  a  Standard  Input  Tape 
(SIT)  against  logic  generated  by  the  P1A  package. 

The  inputs  to  the  TWO  logic  evaluation  consist  of  the  following: 

o  a  complete  block  of  logic  for  each  designation  class 

o  the  table  of  pairwise  "looks"  which  indicates  which  measure¬ 

ments  are  to  be  used  in  evaluating  a  vector  under  all  possible 
pairwise  class  evaluations. 

The  output  of  PROGRAM  TWO  logic  evaluation  consists  of  the 
following: 

o  for  each  logic  error  made  (that  is,  a  pairwise  evaluation  when 
one  class  of  the  pair  is  the  true  class  designation)  the  following 
is  printed  under  headings  on  the  right  side  of  the  printer 
paper:  the  Vector  Index,  the  sort  class  Record  Type,  the 
Number  of  logical  errors  accumulated  for  the  current  vector 
index  value,  and  the  Class  to  which  the  vote  is  assigned. 

o  for  each  vector  evaluation  error  made  (that  is,  a  final 

classification  made  to  an  incorrect  class)  the  following  is 
printed  under  heading  on  the  left  side  of  the  printer  paper: 
the  Vector  Index,  the  correct  class  evaluation,  the  votes 
tallied  for  and  against  the  correct  class,  the  final  (incorrect) 
class  evaluation,  and  the  votes  tallied  for  and  against  the 
incorrect  class. 

o  for  each  evaluation  tie  made  (that  is,  the  logic  evaluation 

resulted  in  a  tie  between  the  correct  class  and  one  or  more 
others)  the  following  is  printed  under  headings  on  the  left 
side  of  the  paper:  the  Vector  Index,  the  true  class  symbol, 
the  votes  tallied  for  and  against  the  correct  class,  and  the 
other  classes  involved  in  the  tie. 

o  for  each  complete  class  of  vectors  evaluated  a  recap  of  the 
logic  results  is  printed  which  contains  the  following:  the 
class  symbol  designation,  the  total  number  of  vectors  evalu¬ 
ated  under  that  designation,  the  votes  required  for  a  perfect 
evaluation  (that  is,  no  logic  errors  made  for  an  individual 
vector),  and  the  number  of  correctly  classified  vectors  which 
accumulated  each  vote  count. 
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The  vote  assignment  procedure  is  described  below: 

o  The  vector  is  first  evaluated  using  its  "true"  class  logic 
against  all  other  class  possibilities.  After  a  logic  run  is 
completed  the  vote  tally,  and,  if  necessary,  the  logic  select 
procedures  are  operated. 

o  For  each  pairwise  evaluation  which  has  not  been  accomplished 
previously,  a  pair  index  is  computed  (total  pairs  -  (number  of 
classes  -  lower  class  index)  .  (number  of  classes  -  lower 
class  index  +  1)  /  2  +  higher  class  index  -  lower  class  index. 

o  The  appropriate  pair  "looks"  are  selected  for  the  computed 
pair  and  the  sort  class  values  for  those  looks  are  extracted 
from  the  last  word  of  the  vector.  The  number  of  meastire- 
ments  to  be  vised  is  computed  at  this  time  ((sort  class  1  value 
+  sort  class  @  value)  *  4  +  2  +  extra  measurements).  The 
logic  record  type  is  determined. 

o  If  a  Fisher  or  Discriminant  evaluation  is  called  for,  the 
number  of  lines  is  computed  (number  of  lines  =  |  record 

type  |  /  number  of  measurements  to  be  evaluated),  and  the 
location  of  the  "weights"  and  the  "thresholds"  are  found 
within  the  logic  record.  The  proper  measurements  are  then 
extracted  from  the  vector  and  the  standard  evaluation  is  per¬ 
formed,  a  "win"  vote  and  a  "loss"  vote  is  assigned,  and  the 
evaluation  continues  (after  a  logic  error  printout  if  appro¬ 
priate) 

o  If  an  arbitrary  vote  record  type  is  assigned  the  "win"  and 
"loss"  votes  are  assigned  (a  "no  decision"  record  causes 
"los 8 "  votes  to  be  assigned  to  both  pair  classes)  and  the 
logic  evaluation  continues. 

The  Vote  Tally  procedure  is  described  below: 

o  The  Vote  Tally  procedure  operates  at  the  conchision  of  each 
logic  run  (that  is,  the  in-core  logic  has  been  evaluated  for 
each  pairwise  combination  for  the  current  vector). 
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A  "temporary  -winner"  is  found  by  selecting  a  class  which  has 
the  most  ,lwin"  votes  and  has  completed  its  vote  tabulation 
(the  sum  of  "win"  votes  and  "loss"  votes  for  that  class  equals 
the  number  of  classes  minus  one). 

o  A  "final  winner"  is  determined  when  the  "temporary  winner" 
is  found  to  have  the  fewest  "loss"  votes  among  all  the  classes 
including  those  for  whom  vote  tabulation  is  incomplete.  When 
a  "final  winner"  has  been  determined  it  is  checked  against 
the  true  class,  the  appropriate  printout  generated  (if  neces¬ 
sary)  and  the  next  vector  called  for  evaluation. 

o  K  th^re  is  any  class  which  has  fewer  "loss"  votes  than  the 
temporary  winner,  the  Logic  Seieet  procedure  is  operated. 

o  If  there  are  one  or  more  classes  which  have  the  same  number 
of  "loss"  votes  as  the  temporary  winner,  then  each  class  in 
that  category  is  checked  to  determine  if  :he  vote  tabulation 
is  complete  for  that  class.  If  there  are  any  of  these  classes 
which  fail  this  check,  the  Logic  Select  procedure  is  placed 
into  operation,  otherwise  the  tie  vote  printout  is  generated 
and  the  next  vector  called  for  evaluation. 

The  Logic  Select  procedure  is  described  below: 

o  The  Logie  Select  procedure  describes  which  class  logic  is  to 
be  evaluated  when  a  "final  winner"  cannot  be  determined  at 
the  end  of  a  logic  rim. 

o  The  class  logic  selected  is  determined  by  finding  the  class 
with  the  fewest  losses  which  has  not  ha.d  a  complete  vote 
tabulation.  If  there  are  more  than  one  of  these,  the  one  with 
the  highest  number  of  "win"  votes  is  selected. 

Run  Time  Instructions 

(1)  Compile  and  execute  the  TWO  package. 

(2)  The  program  will  come  to  Pause  111  (lllg  in  the  A  register 
on  the  1604  console). 

(a)  Mount  the  SIT  on  physical  tape  5 

(b)  Mount  the  design  logic  tape  on  physical  tape  drive  3. 


(3)  Four  sense  switch  settings  are  available  to  allow  user  con¬ 
trol  over  program  wrap-up  and  printout.  Setting  SSI  will  cause  TWO  to 
output  final  class  statistics  and  complete  operations  immediately.  SS3 
set  will  cause  the  system  to  generate  a  primary  audit  trail  for  each  logic 
evaluation  event.  If  the  user  sets  SS4,  each  incorrectly  evaluated  vector 
will  be  printed;  and  SS6  will  cause  the  generation  of  a  detailed  audit  trail 
for  each  input  vector. 

(4)  Two  data  cards  are  required  for  the  execution  of  TWO: 

(a)  Card  1  contains  an  integer  value  in  column  1  repre¬ 
senting  the  operating  mode  of  TWO.  When  the  mode  is  zero  (0),  all  vectors 
on  the  input  tape  will  be  evaluated.  Currently,  a  mode  value  of  one  (1) 
evaluation  of  a  single  class  will  be  limited  to  101  consecutive  vectors,  after 
which  the  SIT  vectors  will  be  skipped  until  a  new  class  vector  is  input.  A 
mode  value  of  two  (2)  will  cause  a  printout  in  the  format  "a/b  nnnnn"  for 
each  input  character  where  a  is  the  final  character  evaluation,  b  is  the 
symbol  attached  to  the  input  character,  and  nnnnn  is  the  author  identifica¬ 
tion  value  for  that  character.  * 

(b)  Card  2  for  program  TWO  is  the  same  as  card  3  for 
PI;  that  is,  the  number  of  data  characters  in  the  logic  design  in  columns 

1-2  and  the  class  symbols  for  the  data  set  in  columns  3-50  listed  in  the  order 
of  original  data  input. 


Pause  Number 

Event 

Continuation  Action 

1 

End  of  file  on  logic  input 
tape 

Hardware  malfunction 

2 

Parity  error  on  logic 
input  tape 

Continue  or  restart 

77 

Parity  error  on  data 
input  tape 

Continue 

111 

Program  Entrance 

Mount  tapes  and  continue 

710 

End  of  file  on  data  input 
tape 

a.  Set  sense  switch  and 
continue 

b.  Mount  another  input 
tape  and  continue 

1211 

Program  Completion 

None 

*  -  Not  implemented 
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PAUSE  111 


read  card  1 
IMODE  = 
mode  setting 


read  card  2;  NUMCLASjS 
=  #  of  classes  in  desig  x 
set;  ISYM  =  class  sym¬ 
bols  in  design  set 


read  pair  loolt 
deck  into 


IPAIR  array 


[read  logic 
tape  into 
drum 


print  out 
last  class 
statistics 


PAUSE 

1211 
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7 . 4.  UTILITY  PROGRAMS  ^gg 

7.4.1.  MAKE 

Program  MAKE  creates  a  discard  vector  tape  on  physical  tape  6 
for  input  into  PI  from  a  punched  card  deck.  The  input  deck  consists  of 
card  3  from  PI  plus  a  card  for  each  vector  to  be  discarded  with  the  symbol 
for  the  vector  in  column  1  and  the  author  identification  value  in  columns 
2-6.  The  cards  must  be  grouped  by  class  and  ordered  (by  class)  in  the  same 
manner  as  the  design  data  input.  The  output  tape  receives  the  total  num¬ 
ber  of  discarded  vectors,  a  table  of  indices  into  the  first  vector  to  be  dis¬ 
carded  from  a  given  class,  and  a  list  of  up  to  3000  author  identification 
values  to  be  discarded. 

7 . 4.  2.  S/SSQ  TAPE  MAKE  UTILITY  and  LOGIC  TAPE  COMBINE  UTILITY 

These  utilities  are  a  set  of  tape  handling  routines  which  copy  and/or 
skip  non-formatted  records  from  one  tape  to  another.  Subroutine  SKIP(I,  J) 
skips  I  records  on  logical  tape  J  (where  J  =  1  for  physical  tape  2;  10  for 
physical  tape  5;  or  20  for  physical  tape  6).  Subroutine  COPI(I,  J)  copies  I 
records  from  tape  J  to  a  tape  mounted  on  physical  tape  drive  6.  Subroutine 
CLEAR(I,  J)  writes  54  null  records  for  each  class  symbol  in  the  design  set 
from  class  index  I  to  class  index  J.  Subroutine  WRTHEAD  writes  an  S/SSQ 
header  record  on  tape  6.  Subroutine  COPY(I,  J)  copies  54  records  for  each 
of  I  classes  from  logical  tape  J. 
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SECTION  8 

CONCLUSIONS  AND  RECOMMENDATIONS 


The  importance  of  careful  data  collection  and  editing  to  the  genera¬ 
tion  of  good  logic  should  not  be  underestimated.  Due  to  mislabeling,  noise, 
and  drop-outs  caused  by  ink- camera  incompatibility,  many  characters  were 
unsuitable  for  inclusion  in  the  design  set  data  base.  The  system  for  cor¬ 
recting  these  faults  (the  Data  Base  Editing  Package)  by  either  re-labeling, 
noise  elimination,  or  deletion,  was  of  great  importance  in  arriving  at  a 
usable  data  base. 

The  alpha-numeric  recognition  system  with  reject  strategy  A 
achieved  recognition  rates  comparable  to  humans.  Most  substitutions 
occurred  when  the  substituted  character  looked  like  a  character  in  the 
decision  class  due  to  breaks  in  the  characters  or  to  similarity  of  the  shape 
of  the  character  with  the  decision  class.  Some  errors,  however,  were  due 
to  insufficient  sample  size  in  the  design  class  in  the  particular  sort  class 
to  which  the  character  belonged.  We  recommend  strengthening  the  logic 
by  increasing  the  number  of  samples  of  a  given  character  class  in  those  sort 
classes  where  it  is  needed. 

The  existence  of  confusion  pairs  such  as  Z  and  2,  S  and  5,  0  and 
O,  makes  the  totally  unconstrained  alpha-numeric  character  set  an  impos¬ 
sible  set  upon  which  to  achieve  practical  recognition  rates.  We  recommend 
the  adoption  of  some  constraints,  e,  g. ,  slashing  zeros  and  Z's,  and 
insisting  that  the  upper  right  horizontal  bar  on  the  5  not  be  curved  down  in 
confusion  with  an  S  -  so  that  the  confusion  between  certain  character  pairs 
is  removed.  The  ANSI  set  of  guidelines  for  handprinting  may  be  used  as 
a  fairly  rigid  set  of  constraints  or  a  less  stringent  set  of  constraints  which 
attempts  only  to  differentiate  characters  in  the  confusion  pairs  may  be 
designed. 


The  amount  of  constraint  depends  upon  the  specific  application 
chosen.  If  a  controlled  group  of  people  generates  the  handprinting,  more 
constraints  can  be  placed  on  the  printing  since  training  the  people  to  follow 
the  constraints  would  be  possible.  If  the  input  to  the  recognition  system 
comes  from  an  uncontrolled  environment,  such  as  the  general  public, 
constraints  must  be  kept  to  a  minimum. 

The  amount  of  constraint  will  dictate  the  parameters  of  the 
machine.  If  mild  constraints  are  chosen,  then  constraints  must  be  effec¬ 
tively  "built  in"  to  the  recognition  logic,  resulting  in  a  "tight"  machine 
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which  rejects  a  character  unless  it  satisfies  the  built-in  constraints.  If 
rigid  constraints  are  chosen  and  followed,  then  the  machine  can  be  of  the 
"open"  variety,  where  more  variability  in  character  shapes  are  permitted. 

The  technique  has  been  tested  and  proven  to  provide  a  capability 
comparable  to  human  recognition.  As  in  any  development  program,  it  is 
difficult  to  proceed  beyond  a  certain  point  without  a  specific  application. 
Furtherinore,  refinement  of  the  technique  will  only  be  accomplished  via  a 
specific  problem  application. 

We  recommend  that  a  specific  application  be  chosen  and  appropri¬ 
ate  constraints  be  adopted  so  that  a  solution  tailored  to  the  character  set 
with  the  particular  constraints  may  be  achieved. 
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